perm filename KRD6.MSS[PEG,DBL]2 blob sn#481080 filedate 1979-06-30 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00050 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00006 00002	.S(Knowledge Acquisition II,learning new primitives)
C00011 00003	.SS(Key ideas:#  Overview,BIO:)
C00018 00004	.SS(The fundamental problem,FUNDPROB:)
C00021 00005	.SS(Sources of difficulty,SOURCEDIFF:)
C00030 00006	.SS(The solution,SOLN:)
C00034 00007	.SS(Key ideas:# Comments,BIC:)
C00035 00008	.SSS(Vocabulary,POV:)
C00037 00009	.SSS(Schemata as knowledge representation descriptions,SAKRD:)
C00040 00010	.ind data structure interrelationships
C00042 00011	.SSSS(Extensions--data structure interrelationships,EXTDSI:)
C00045 00012	.SSS(A "totally typed" language,TT:)
C00053 00013	.SSS(Knowledge base integrity,KBI:)
C00057 00014	.SSS(Summary,SUM:)
C00059 00015	.SKIP TO LINE 1 TRACESEC( Acquiring new values)
C00063 00016	.BEGIN "TRACE" STARTRACE
C00073 00017	.SKIP TO LINE 1 SSS(Acquisition of a new culture site)
C00081 00018	.SKIP TO LINE 1 SS(Knowledge about representations:# Organization,KARO:)
C00084 00019	.SSS(The schema hierarchy,SHIERARCHY:)
C00092 00020	.SSS(Schema organization,SCHO:)
C00096 00021	.SSSS(|Instance structure|)
C00099 00022	.skip to line 1 STARTFIGSELECT 6TURN ON "→↓_"GROUP SKIP 6
C00103 00023	.SKIP TO LINE 1
C00108 00024	.SSSS(Interrelationships)
C00114 00025	.SSSS(Current instances)
C00117 00026	.SKIP TO LINE 1 SSS(Slotnames and slotexperts,SLOTN:)
C00124 00027	.SSSS(|Slotnames as data structures, "circularity" of the formalism|)
C00126 00028	.SKIP TO LINE 1  SS(Knowledge about representations:# Use,KARU:)
C00131 00029	.SSSS(Adding to the structure of the new concept)
C00135 00030	.SSSS(Attending to data structure interrelations,SFNDSI:)
C00155 00031	.SSS(Where to start in the network,WTS:)
C00163 00032	.SSS(Schema function:# Access and storage,SFAS:)
C00169 00033	.SKIP TO LINE 1 TRACESEC( Acquiring a new attribute,NATT:)
C00172 00034	.BEGIN "TRACE" STARTRACE
C00183 00035		Let's take a moment out to review what's happened so far and to see
C00191 00036	.BEGIN "TRACE" STARTRACE
C00195 00037	.SSS(Comments on the trace)
C00205 00038	.SS(Knowledge about knowledge about representations,KAKAR:)
C00214 00039	.SSS(The SCHEMA-SCHEMA,SCHSCH:)
C00218 00040	.SKIP TO LINE 1 STARTFIGturn on "↓_" << the schema schema >>
C00221 00041	.SKIP TO LINE 1  TRACESEC( Building the schema network,BSN:)
C00233 00042	.SSS(Comments on the trace,COTT:)
C00239 00043	.SS(Levels of knowledge,LOK:)
C00243 00044	.SSS(Level of detail)
C00245 00045	.SSS(Level of generality)
C00251 00046	.SSS(Impact)
C00255 00047	.SS(Limitations)
C00267 00048	.SS(Future work)
C00273 00049	.SS(Summary,SUM6:)
C00278 00050	.SSS(Current capabilities,CURCAP:)
C00283 ENDMK
C⊗;
.S(Knowledge Acquisition II,learning new primitives);
.BEGINSMALLQUOTE;TURN ON "→";indent 0,0,0;
→Yes, but I see that even your own words miss the mark....
.ENDSMALLQUOTE(|%2Oedipus the King%*, line 324|);

.SS(Introduction);
	The techniques described in the previous chapter make it possible for
the expert to teach the system new rules, expressed in terms of known concepts.
But this capability alone would be insufficient for any substantial education
of the system since gaps in the knowledge base might require rules dealing
with concepts not yet known to the system.  This
chapter describes how the expert can teach the system new
conceptual primitives and new types of conceptual primitives.∪∪A "new
conceptual primitive" means a new instance of one of the 13 primitives listed in
{YON2 BKGNDRHLL}.  A "new type of conceptual primitive" refers to
teaching the system about a new kind of primitive in addition to the existing
13.∪
	This capability
requires dealing with a new set of problems, in addition to
those faced earlier.  There will, in particular, be a greater emphasis on
the manipulation of data structures in the knowledge base.  Acquisition of
new rules dealt with a single type of structure, one which was understood in
terms of a combination of available primitives.  There was thus a single,
uniform process for acquisition and integration, with an emphasis on
understanding and interpreting the English text.  Here, in the acquisition
of new conceptual primitives,
it is necessary to
deal with a wide range of data structures, each of which may have its own
requirements for integration into the knowledge base.  The problem thus has
two major aspects to it: (%2i%*) knowledge acquisition and (%2ii%*)
knowledge base management.  In response, the techniques used address both
the difficulties presented by the knowledge transfer process and the
general issues of constructing and maintaining a large collection of data
structures.
	This chapter is divided into three main parts.  The first part
introduces the idea of a %2data structure schema%*, a device for describing
representations, and contains the bulk of the discussion about it.  It
begins with a general overview of the fundamental problems attacked and the
basic ideas used.  It then continues with traces that show how && directs the
acquisition of a new value for an attribute, demonstrating a simple
example of "filling out" an existing schema to produce a new instance.
{YON1 KARO} and {YON1 KARU} then discuss the organization and use of the
knowledge carried in the schemata.
	{YON1 NATT} starts the second part with an example and an explanation
of the acquisition
of a new attribute.  This part demonstrates the use of the schemata on
more complex data structures and indicates how a new schema can be
acquired using the same techniques employed for acquiring new instances.
	The last part begins with {YON1 BSN}, which describes how to start
the knowledge acquisition process when building an entirely new knowledge
base.  It shows an example of &&'s performance on this task.

.SS(Key ideas:#  Overview,BIO:);
	The discussion below requires including a certain amount of detail
concerning both the schemata and internal data structure
implementation.  It also ranges over a large number of topics and
examines steps toward solving many of the problems.   To insure
that the more important ideas are not lost in the mass of detail, they are
summarized below and labelled with  the section in which they
first appear.
	The most basic observation we make is that
.ind knowledge about representations
.BEGINlist; 
%1(1)\By supplying a system with a store of knowledge about its own
representations, both knowledge acquisition and knowledge base management
can be carried out in a high-level dialog that transfers information
relatively easily.  {YON1 SOLN}
.SKIP 1;
.endlist;

.CONTINUE;
Further observations dealing with the store of knowledge about representations
include:
.ind extended data type
.BEGINlist;
%1(2)\Each of the conceptual primitives (knowledge representations) from which
rules (and other structures) are built will be viewed as an extended data
type.  Each such extended data type is described by a %2data structure
schema%*, a record-like structure augmented with additional information
.ind schema
(such as data structure interrelations).  {YON1 SOLN}
.SKIP 1;
%1(3)\The schemata provide a language and mechanism for describing representations
and hence offer a way of expressing a body of knowledge about them.  {YON2 SAKRD}
.SKIP 1;
%1(4)\The body of knowledge is organized around the representational primitives in use
(such as, in this case, the notion of %2attribute%*, %2object%*, %2value%*,
etc.).  {YON2 SCHO}
.SKIP 1;
%1(5)\Knowledge is represented as a collection of prototypes (the schemata).
{YON1 KARO}
.SKIP 1;
%1(6)\Knowledge can be viewed in terms of different levels of generality:
(%2i%*)##schema instances, (%2ii%*)##schemata, and (%2iii%*)##"schema-schema."# {YON1 LOK}
.SKIP 1;
%1(7)\The techniques we use gain a certain degree of generality by keeping the
knowledge carefully stratified according to those levels.  {YON1 LOK}
.SKIP 1;
%1(8)\The set of schemata can itself be organized into a generalization hierarchy.
{YON2  SHIERARCHY}.
.SKIP 1;
.endlist;

.CONTINUE;
.ind knowledge base management
Observations dealing with knowledge base management include:
.BEGINlist;
.ind extended data type
.ind knowledge representation
%1(9)\It is useful to consider the terms %2data structure%*, %2extended data
type%*, and %2knowledge representation%* as interchangeable.  {YON2 POV}
.SKIP 1;
%1(10)\The system is "totally typed" in the sense that ideas (4) and (5) above
are applied exhaustively to all representations and data structures in the
system.  {YON2  TT}.
.SKIP 1;
%1(11)\Unlike ordinary record structures or declarations, the schemata are a part
of the system itself and are available to the system for examination.
{YON2 TT}
.SKIP 1;
.endlist;

.CONTINUE;
Ideas relevant to knowledge acquisition include the suggestions that:
.BEGINlist;
%1(12)\Knowledge acquisition can proceed by interpreting the information in the
schemata as a set of instructions for the construction and maintenance of the
relevant knowledge representations.  Hence it is the process of schema
instantiation that drives knowledge acquisition.  {YON1 KARU}
.SKIP 1;
%1(13)\Doing knowledge acquisition via the schemata offers a certain level of
knowledge base integrity.  {YON2 KBI}
.SKIP 1;
%1(14)\Acquisition of a new %2instance%* of an existing conceptual primitive is
structured as a process of descent through the schema hierarchy noted
above.  {YON2 SHIERARCHY}
.SKIP 1;
%1(15)\Acquisition of a new %2kind%* of conceptual primitive is structured as a
process of adding new branches to the schema hierarchy.   {YON1 NATT}
.ENDlist; 

.SS(The fundamental problem,FUNDPROB:);
.ind knowledge base management
	Viewed from the  perspective of knowledge representation and
knowledge base management, the problem of acquiring a new conceptual
primitive can be seen in terms of adding a new instance of an extended data
type to a large program.  Using the standard approach, a programmer
attempting this task would have to gather a wide range of information,
including the structure of the data type and its interrelations with other
data types in the program.  Such information is typically recorded
informally (if at all) and is often scattered through a range of sources; it
might be found in comments in program code, in documents and manuals
maintained separately, and in the mind of the program architect.  Just
finding all of this information can be a major task, especially for someone
unfamiliar with the program.
	In this situation, two sorts of errors are common:# The new instance
may be given the wrong structure or it may be improperly integrated into
the rest of the program.  Since an extended data type may be built from a
complex collection of components and pointers, it is not uncommon that a
new instance receives an incorrect internal organization, that extraneous
structures are included, or that necessary elements are inadvertently
omitted.  Since data structures in a program are not typically independent,
the addition of a new instance often requires significant effort to
maintain the existing interdependencies.  Errors can result from doing this
incorrectly, by violating the interrelationships of existing structures 
or (as is more common) by omitting a necessary bookkeeping step.
.SS(Sources of difficulty,SOURCEDIFF:);
	A basic source of difficulty in solving the problem of acquiring a
new conceptual primitive arises from our desire to deal with the issues
noted above (insuring correct structure for a newly added primitive and
maintaining existing interrelationships) in the context of the global goals
set out at the beginning.  That is, a nonprogrammer should be able to build
the knowledge base and be able to assemble large amounts of knowledge.  The
first of these goals means that the user cannot (nor should he be expected
to) deal with the system at the level of data structures; he  needs a
dialog at a higher level.  This is accomplished by having && take care of
the "details" and having the user supply only domain-specific information.
	The second global goal--building a large knowledge base--brings
with it the problem of complexity.  There is a well-known phenomenon in all
programs (but most obvious in large systems)  referred to as the "1 + epsilon
bug" phenomenon:  A change introduced to fix a known bug may
result, on the average, in the creation of %2more than%* one new bug.  The
system may thus be inherently unstable, since any attempt to repair a
problem may introduce more problems than it repairs.
	Complexity arises in our case primarily because of size: the
size of the performance program's knowledge base and the wealth of detail
involved in fully describing its knowledge representations.  There is, for
example, a large number of different data types, each with its own
structural organization, its own set of interrelations with other data
types, and its own set of requirements for integration into the program.
There is also a large number of instances of each data type.  Since
modifications to a data type design have to be carried out on all of its
instances, the efficient retrieval and processing of this set is another
problem that involves the management of large numbers of structures.
	In order to make the acquisition of new conceptual primitives possible
in the context of our original goals, then, we have to provide the user with
a system that carries on a high-level dialog and that keeps track of the
numerous details of data structure implementation.  The first of these design
requirements will insure that the system is comprehensible to the user;
the second will insure that new bugs are  not inadvertently created while fixing old
ones.
	Note that this emphasis on the necessity of avoiding bugs during the
process of acquiring new primitives is really no  different  than the view
presented
earlier in discussing explanation and rule acquisition.  When dealing with
rules, we noted that the large amount of knowledge required for high
performance makes shortcomings in the knowledge base inevitable, and we
emphasized the benefits of using these shortcomings
to provide the context and focus for knowledge acquisition.  We will
similarly use shortcomings in the knowledge base to provide the context for
the acquisition of new conceptual primitives.

	In both cases, we must avoid introducing errors during the process
of knowledge acquisition.  This is of greater concern during acquisition of
new conceptual primitives for reasons arising out of the nature of
the errors encountered and the objects involved.
	As noted earlier, conceptual primitives are each individually far
more complex in their structure than rules.  In addition, there is only a
single rule format, while there are many different conceptual primitive 
structures.  There is, thus, far greater opportunity for error.
	In addition, where the rules are designed to be fundamentally
independent, data structures used to represent the conceptual primitives are
often interrelated in subtle ways and the character of the errors produced
is very different.  The independence of rules means that their interaction
during a consultation can be understood by a simple model in which the
contribution of each rule is considered individually.  This is manifestly
untrue of complex data structures, where errors in format can result in
subtle interactions.
	Finally, there is the issue of the
conceptual level of the objects being manipulated, that is,  their likely
familiarity to the expert.  It seems reasonable to assume that
domain-specific rules will deal with knowledge sufficiently familiar to the expert
that he will be able to  understand the program in these terms.  We do not assume that
he is familiar enough with data structures and representations to be able
to manipulate or debug them. 
	In summary, recall the distinction drawn in chapter 1 between
expertise and formalism.  Because rules are designed to have a sharply
constrained degree of interaction and to  be comprehensible to the
expert, errors in the knowledge base may properly be considered
shortcomings of expertise.  The complex interrelations of data structures
used to represent conceptual primitives and the subtle nature of the bugs
they produce puts them in the domain of errors of formalism.  In dealing
with the creation of new conceptual primitives, therefore, strong emphasis
is placed on techniques that assure a high degree of integrity.

.SS(The solution,SOLN:);
	In the simplest terms, the solution we suggest is to give the
.ind knowledge about representations
system a store of knowledge about its representations and the capability
to use this knowledge as a
basis for the construction and management of them.
.ind knowledge representation
	In more detail:# We view every knowledge representation in the
.ind extended data type
system as an extended data type.  Explicit descriptions of each data type
are written, descriptions that include all the information about structure
and interrelations that was noted earlier as often being
widely scattered.  Next,
we devise a language in which all of this information can be put in
machine-comprehensible terms and write the descriptions in those terms,
making this store of information available to the system. Finally, we
design an interpreter for the language so that the system can use its new
knowledge to keep track of the details of data structure construction and
maintenance.
This is, of course, easy to say but somewhat harder to do.  Some
difficult questions arise, 
.BEGINQUOTE; SELECT 2;
%2What knowledge about its representations
does a system require in order to allow it to do a range of nontrivial
management tasks?  How should this knowledge be organized?  How should it
be represented?  How can it be used?%*
.ENDQUOTE;
All these issues are dealt with below.  We demonstrate, for
instance, that the relevant knowledge includes information about the
structure and interrelations of representations and show that it can be
used as the basis for the interactive transfer of domain-specific expertise.
	The main task here, then, is the description and use of knowledge
about representations.  To accomplish this, we use a %2data structure
schema%*, a device that provides a framework and language in which
representations can be specified.  The framework, like most, carries its
own perspective on its domain.  One point it emphasizes strongly is the
detailed specification of many kinds of information about representations.
It attempts to make this specification task easier by providing an
organization for the information and a relatively high-level vocabulary for
its expression.
	Note that the schemata form the second major example of meta-level
.ind meta-level knowledge
knowledge.  While a particular data structure may be used to represent an
object in the domain, the schemata (as descriptions of representations) are
meta-level objects.

.SS(Key ideas:# Comments,BIC:);
	To provide some background for understanding the examples of
system performance that follow, we present below some brief comments
on several of the ideas listed in {YON1 BIO}.

.SSS(Vocabulary,POV:);
.ind extended data type
	In the discussion that follows, the terms %2data structure%*, %2extended data
type%*, and %2representation%* will be used interchangeably.  Equating the
first two implies extending the idea of data types to cover every data
structure in a system.  The utility of this view appears to be widely
accepted and, in the case at hand, will influence our approach to
determining what information about data structures is relevant and how that
information should be organized.
	The equivalence of the last two suggests our perspective 
on the design and implementation of knowledge representations.  These
two tasks--design and implementation--are typically decoupled, and, indeed,
the desirability of transparency of implementation has been stressed from
many quarters (e.g., [[Bachman75], [[Balzer67], [[Liskov74]).  But what
might we learn by considering them simultaneously?  That is, what can we
learn about representation design by considering issues that arise at the
level of implementation and technical detail?  Conversely, what can we
learn about the organization or design of data types by viewing them as
knowledge representations?  We examine these questions below.

.SSS(Schemata as knowledge representation descriptions,SAKRD:);
	As noted, the schemata are the primary vehicle for describing
representations.  They were developed as a generalization of the concept of
record structures and strongly resemble them in both organization and use.
Many of the operations with the schemata can be seen in terms of variations
on the task of creating a new instance of a record-like structure.  We will
see that these operations proceed in a mixed-initiative mode:  The need to
add a new data structure is made evident by an action on the part of the
user; && then takes over, retrieving the appropriate schema and using it to
guide the rest of the interaction.
	The schemata take from records the concept of structure
description, the separation of representation from implementation, and the
fundamental record creation operation.  Records provide a simple language
for describing data structures, and this was used as the basis for the
structure syntax in the schemata.  Records also isolate conceptual
structures from details of implementation.  Thus, code may uniformly refer
to field F of record R despite changes in the way the record is actually
stored.  Finally, the operation of creating a new instance of a record was
used as the fundamental paradigm for this part of the knowledge
acquisition task.  At the global level, much that happens in this chapter
can be viewed in terms of creating instances from one or more kinds of
records.

.ind data structure interrelationships
.SSSS(Extensions--data structure syntax,EXTDSS:);
	The basic idea of a record-like descriptor was then extended to
make possible the capabilities we require.  The structure syntax was
extended by adopting some of the conventions of BNF, so that a certain
variability could be described.  For instance, a schema can indicate that a
structure has %2a minimum of 1, a maximum of 4, and typically 2%* components
of a given form.∪∪This variability is what led  to calling them
%2schemata%*, rather than declarations or records, since the latter
typically describe structures with fixed formats.∪

.SSSS(Extensions--data structure interrelationships,EXTDSI:);
	In addition, we introduced a syntax of data structure
interrelations.  As noted above, data structures in a program are not
typically independent and the addition of a new instance of some data type
to the system often requires extensive bookkeeping to maintain the existing
interdependencies.
	This problem has been considered previously, primarily with
techniques oriented around demon-like mechanisms (e.g., the demons in
{PRLANG QA4} [[Rulifson72]).  The approach taken here differs in several
respects. While, as in previous approaches, demon-like mechanisms were
employed to help model the domain, they will also be  used
extensively at the level of data structures, as a tool to aid in management
of the knowledge base.  They  become an important component of our
representation methodology and will be seen to have an influence on the
organization of knowledge in the system.
	Previous uses of demons have also involved the full power of the
parent programming language, as in {PRLANG QA4} or {PRLANG PLANNER}, where
the body of a demon can be an arbitrary computation.  For reasons which
will become clear later, significant effort was put into avoiding this
approach.  We have instead developed a small syntax of interrelationships
that expresses the relevant facts in a straightforward form.
	The fundamental point here is simple enough:# Whatever the
interrelationships, they should be made explicit.  All too often, the
interdependencies of internal data structures are left either as folklore
or, at best, mentioned briefly in documentation. In line with the major
themes of this work, we want to make this knowledge explicit and 
accessible to the system itself.  The interrelationship syntax was the tool
employed to do this.

.SSS(A "totally typed" language,TT:);
	The basic approach, then, was to view the representation primitives
as extended data types in a high-level language, use an augmented
record-like structure to describe each of them, and then make those
structures available for reference by the system itself.  The next step was
to apply this exhaustively and uniformly to every object in the system.
That is, the "language" should be "totally typed," and every object in the
system should be an instance of some schema.  One reason for this is data
base integrity.  A totally typed language makes possible
.ind type checking
exhaustive type checking 
and  one level of knowledge base integrity.  In addition,
since many of the extended data types correspond to domain-specific
objects, the knowledge acquisition dialog can be made to appear to the
expert to be phrased in terms of objects in the domain, while to the system
it is a straightforward manipulation of data structures.  It thus helps 
bridge the gap in perspectives.  Finally, since we were concerned with the
large amount of knowledge about representations that is typically left
implicit, applying the schema idea to every object in the system offered
some level of assurance that we had made explicit some significant
fraction of this information.
	Exhaustive application of the schema idea presents several
implications.  First, it means that even the components from which a schema
is built should also be instances of some (other) schema, and we will see
that this is true.  Second, since we claim both that the schemata should be a
part of the system and that every object in the system should be an
instance of some schema, then the schemata themselves should be an instance
of something.  In more familiar terms,  %2if the structure
declarations are to be objects in the program, and if everything is to be a
data type of some sort, then the declarations themselves must be a data
type%*. This was done.  There is a "schema-schema," which specifies
the structure of a schema, and all schemata are instances of it.
	Since the schema-schema indicates the structure of a schema, it can
be used to guide the creation of new data types.  This offers the same
benefits as before, of a certain level of integrity and a relatively "high-
level" dialog.  Note that it deals, however, with the fairly
sophisticated task of specifying a new data structure.
	While the recursive application of the schema idea was motivated
initially by purely utilitarian considerations, it led  to a useful
uniformity in &&:# There is a single process by which both the schema-schema can
be instantiated to create a new schema (a new knowledge representation)
and by which a schema can be instantiated to create a new instance of an
existing knowledge representation.  This not only made  possible 
bootstrapping the system (described later in this chapter) but also
supplied much of the generality of the approach.  Part of {YONFIG KAOVKA} has
been reproduced below to illustrate this multi-level organization.
.STARTFIG;
.BOXFIG;
		      KNOWLEDGE ACQUISITION
	    ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
	    }                                       }
	    } ⊂π∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂π⊃                   }
	    } }}2              }}∂ ∂ ∂ ∂ ∂ ⊃        }
	    } }} schema-schema }}          ↓        }
	    } α%∀∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∀$ ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃  }
	    }         ⊂ ∂ ∂ ∂ ∂ ∂ } new schema   }  }
	    }         ↓           } acquisition  }  }
	    } ⊂π∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂π⊃ α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂$  }
	    } }}1              }}∂ ∂ ∂ ∂ ∂ ⊃        }
	    } }}    schemata   }}          ↓        }
 KNOWLEDGE  } α%∀∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∀$ ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃  }
   BASE     }                     } new instance }  }
⊂π∂∂∂∂∂∂π⊃  }                ⊂ ∂ ∂} acquisition  }  }← ∂ EXPERT
}}0     }}←∂}∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ $    α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂$  }[dialog]
}} facts}}  }    [knowledge                         }
}} -----}}  }     transfer]                         }
}} rules}}  }                                       }
}}      }}  }                                       }
α%∀∂∂∂∂∂∂∀$  }                                       }
            }                                       }
            }                                       }
            }                                       }
            }                                       }
            }                                       }
            }                                       }
            }                                       }
            α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$

.FIG(The multi-level process of acquiring new primitives);
.ENDFIG;

.SSS(Knowledge base integrity,KBI:);
.ind knowledge base integrity
	Avoiding bugs when manipulating data structures is also known as
"assuring the integrity of the data base" and has been investigated within
the framework of several organizational paradigms (see e.g., 
[[McLeod76] and [[Eswarn75]).  Previous efforts have emphasized the utility
of extensive type checking for extended data types and have studied aspects of
integrity specific to a particular paradigm.  We use many of these same
techniques here, but focus on the problem of interrelationships between
data structures in general  and concentrate on dealing with the effects of
additions on the integrity of the knowledge base.
	While it has not been possible to devise ways of assuring the total
integrity of the knowledge base, the capabilities of our system can be
broadly classified by considering three error sources.  First, the system
can assure a form of completeness, by making sure both that the expert is
reminded to supply every necessary component of a structure and that all
other appropriate structures are  informed of  the newly added item
("informed" is elaborated below).  Second, it can assure "syntactic"
integrity.  There is complete type checking, and no interaction with the
expert will result in incorrect data types in the knowledge base.
	Finally, it can assure a certain level of "semantic" integrity.
The semantics of any individual data structure will be properly maintained,
so that, for instance, a new attribute  will be given all the
descriptors appropriate to it, in the correct form.  It can also assure
some semantic consistency in two or more related structures, but this is as
yet incomplete, since some inconsistencies can arise that require more
knowledge about the domain than is currently available.  For instance, in
the medical domain, while describing a new organism a user might indicate
that it is an "acid-fast coccus" ("acid-fast" describes a response to a
kind of stain) when, in fact, the combination is biologically meaningless.
Each individual answer is correct but the combination is inconsistent for
reasons that are not easily represented.

.SSS(Summary,SUM:);
	The schemata and their associated structures provide a language
and framework in which representations can be specified.  It should be
emphasized that all of the work reported here was at the level of the design
and the implementation of this language and framework in &&.  Some of the
representations that the language can describe are those used in the
current performance program ({SYSTM MYCIN}); later sections of this chapter
examine the limits of its expressive power.  Within those limits, the
system deals with the general issue of the design and specification of
representations.  Nothing here is specific to medicine or to the
attribute-object-value representations that we will see employed.  Within
the range of representations that  our framework permits, the system is
domain independent and has a degree of representation independence as
well.  This generality results from the isolation and stratification of the
three different levels of knowledge in the system, discussed in detail in
{YON1 LOK}.

.SKIP TO LINE 1; TRACESEC( Acquiring new values);
	Two examples--the acquisition of new values for organism identity
and for culture site--will provide an overview of &&'s capabilities.
This demonstration uses a version of the performance program with a very simple
knowledge base, as it might appear in an early stage of development when
it contains only a few attributes and a few values for each.
	Some preliminary comments should be made about these examples.
	First, since we will be dealing with some complex data structures from a
specific performance program, much of what happens in the trace derives
from implementation conventions that are part of that program.  Since &&'s
acquisition process has to be thorough, it takes care of all of them.  The
important point to note is not what these conventions are but that && can
deal with them.
	Second, the dialogs are at times deceptively simple. This is in
part some measure of success, since we have managed to delegate much of the
detail to &&, which takes care of it quietly in the background.  To see
this point most clearly, consider after reviewing the traces the amount of
work that would be needed to do the same tasks by hand:# There are a number
of details of system construction that would have to be memorized and a
significant amount of effort expended to create and edit the structures by
hand.  Much more work would be involved if it were necessary to teach the
expert (who may never have programmed before) how to do the same things.
&& attends to the details, does most of the work, and maintains a
relatively high-level dialog.

.SSS(Acquisition of a new organism identity,NEWORG:);
	The first example shows how && guides the process of describing a
new kind of organism. It demonstrates primarily the ability of the schemata
to keep track of all the structural details in a data structure.  It also
illustrates some of the human engineering features of the system.

.BEGIN "TRACE" STARTRACE;
.STARTCOM;
{The expert is in the middle of a normal rule acquisition sequence when he
mentions an organism name that the system has never encountered before.}
.ENDCOM;

The new rule will be called RULE384
  If 1 -%3 THE SITE OF THE CULTURE IS BLOOD%*
     2 -%3 THE PORTAL OF ENTRY OF THE ORGANISM IS THE%*
	%3URINARY TRACT%*
     3 -%3 THE PATIENT HAS NOT HAD A GENITO-URINARY%*
	%3 MANIPULATIVE PROCEDURE%*
     4 -%3 CYSTITIS IS NOT A PROBLEM FOR WHICH THE%*
	%3 PATIENT HAS BEEN TREATED%*
     5 - 
 Then
     1 -%3 THE ORGANISM IS LIKELY (.4) TO BE AN E.COLI%*

Sorry, but I don't understand
   THE ORGANISM IS LIKELY (.4) TO BE AN E.COLI
because these words are unknown:  E.COLI
.STARTCOM;
{&& finds it cannot generate any interpretations of the text, so it
reprints the original line and specifies where the trouble is.  This makes
it possible for the expert to figure out what it is that is new to the
system.}
.ENDCOM;

Would you care to try rephrasing that?
++** %3Y%*
(Please use *'s to mark what you think might be a new attribute
 or a new value of a attribute.)

        1 -%3  THE ORGANISM IS LIKELY (.4) TO BE AN * E.COLI *%*
.STARTCOM;
{While the language in this example is very simple, there are more complex
cases where it would be quite hard to decide from the context what sort of
new item the expert had in mind.  To avoid the necessity for more
sophisticated natural language handling, he is asked to mark off the new
word or phrase with asterisks#... }
.ENDCOM;

It looks as though
     E.COLI
refers to a new value of an attribute, correct?
++** %3Y%*
 ...is it a new value of the identity of a organism?
++** %3Y%*
.STARTCOM;
{and with the new item clearly marked, the system can then do some very
simple pattern matching to decide what's going on.}
.ENDCOM;

  Now please tell me a few things about it...
.STARTCOM;
{&& retrieves the schema for an organism identity and "reads" it
to find out what it needs to know about any organism.}
.ENDCOM;

  Please give the full, formal name for "E.COLI"
  ++** %3ESCHERICEA COLI%*
  = ESCHERICEA-COLI
.STARTCOM;
{Many long medical terms have abbreviations, so the system requests the
formal name first.  (Hyphens are inserted to make a single identifier out
of it.)}
.ENDCOM;

  Now please give all synonyms or abbreviations for ESCHERICEA-
  COLI which you would like the system to accept:
  [type an empty line when done]
  ++** %3E.COLI%*
  ++**
.STARTCOM;
{The system requests synonyms and abbreviations so that they can be used in
the future.  The first of them is then used through the rest of the
dialogue.}
.ENDCOM;

  Please give a short description of E.COLI as a identity of a
  organism.
  ++** %3?%*
  [e.g., for PSEUDOMONAS, the description is "AN ORGANISM
   COMMONLY FOUND IN THE HOSPITAL, QUITE OFTEN A SOURCE OF
   SECONDARY INFECTIONS IN PATIENTS WHOSE IMMUNE RESPONSE IS
   WEAKENED"]
  [type an empty line when done]
  ++** %3NORMAL FLORA IN THE LOWER INTESTINE, BUT A%*
  ++** %3CAUSE OF SEPTICEMIA WHEN TRANSMITTED TO THE%*
  ++** %3BLOOD STREAM%*
  ++**
.STARTCOM;
{Two useful capabilities are displayed here.  First, the expert is asked to
describe every structure he creates, providing a form of on-line
.ind documentation
documentation for all data structures.  Since the documentation is acquired
incrementally along with the rest of the knowledge base, the task is much
easier than attempting to document an entire system at once.
	Second, && can generate sample answers for any of its questions.
In some previous session, the expert had evidently taught the system about
pseudomonas.  Since each schema has an indication of all of the instances
that were created from it, the system picks one of those instances and
displays the answer to this question for that instance.}
.ENDCOM;

  What is the aerobicity of a E.COLI?
  The answer should be in the form of 1 or more of the 
  following-
           <aerobicity> <certainty factor>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3FACULTATIVE .8%*
  ++** %3AEROBIC .2%*
  ++**
.STARTCOM;
{Each of the questions from the system is prompted by the attempt to fill
in some component of the data structure being built, according to the
representation conventions indicated by the schema.  One of those
conventions indicates that  aerobicity, gramstain, and morphology are
stored as part of an organism.  The schema also indicates the format for
each piece of substructure, and this is displayed as instructions to the
user.}
.ENDCOM;

  What is the gramstain of a E.COLI?
  The answer should be in the form of a
           <gramstain>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3ROD%*
    Sorry, but the following are invalid -
        ROD is not a recognized <gramstain>
    Please answer again [use the same answer if you really 
    meant it.]
  ++** %3GRAMNEG%*
  ++**
.STARTCOM;
{The formatting information also allows a check on the validity of each
answer, to insure that all information added to the knowledge base is
properly structured.  This time the expert made a mistake. It can happen,
however, that the answer is correct but the performance program simply
hasn't heard of it yet.  Early in its "education," the knowledge base may,
for instance, not yet have information about all the gramstain values, and
this might become evident in the course of teaching it about a new
organism.  Examples of this are found in additional traces later in this
chapter, which demonstrate that && sets up new subtopics as required.}
.ENDCOM;

  What is the morphology of a E.COLI?
  The answer should be in the form a
           <morphology>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3ROD%*
  ++**

[Adding E.COLI to ORGANISMS]
.STARTCOM;
{In addition to creating the new structure, it's necessary to add it to an
internal list called %AORGANISMS%2. The message is printed simply to
indicate that the proper step has been taken.}
.ENDCOM;

  Ok, done with E.COLI now...
  Back to the rule.

This may take a few moments.
.STARTCOM;
{Acquiring the rule then can continue as before.}
.ENDCOM;
.END "TRACE"
.SKIP TO LINE 1; SSS(Acquisition of a new culture site);
	Having taught the performance program about e.coli, the expert might later
start adding rules about the urinary tract and for the first time mention
urine as a culture site. The next example shows how this would proceed and
demonstrates &&'s handling of a fairly complex set of data structure
interrelationships.
.BEGIN "TRACE" STARTRACE;

The new rule will be called RULE384
  If 1 -%3 THERE IS NO HISTORY OF PYELONEPHRITIS%*
     2 -%3 THE ORGANISM WAS CULTURED FROM THE URINE%*
     3 -%3 THERE IS NO HISTORY OF RECURRENT UTI'S%*
     4 -
 Then
     1 -%3 THE ORGANISM IS LIKELY (.3) TO BE E.COLI%*


Sorry, but I don't understand
   THE ORGANISM WAS CULTURED FROM THE URINE
because these words are unknown:  URINE

Would you care to try rephrasing that?
++** %3Y%*
  (Please use *'s to mark what you think might be a new 
   attribute or a new value of an attribute)
        3 -%3  THE SITE OF THE CULTURE IS * URINE *%*
It looks as though
     URINE
refers to a new value of an attribute, correct?
++** %3Y%*
 ...is it a new value of the site of a culture?
++** %3Y%*

  Now tell me a few things about it...

  Please give the full, formal name for "URINE"
  ++** %3URINE%*

  Now please give all synonyms or abbreviations for URINE
  which you would like the system to accept:
  [type an empty line when done]
  ++** 

  Please give a short description of URINE as a culture site.
  [type an empty line when done]
  ++** %3THERE ARE SEVERAL METHODS OF OBTAINING URINE%*
  ++** %3SPECIMENS, SOME MORE LIKELY TO PRODUCE STERILE%*
  ++** %3RESULTS. BECAUSE OF THE LARGE POSSIBILITY OF%*
  ++** %3CONTAMINATION, CULTURES ARE NOT CONSIDERED%*
  ++** %3SIGNIFICANT UNLESS COLONY COUNT IS 100,000 OR%*
  ++** %3MORE.%*
  ++** 

  Should this new item be added to-
       1 - those sites that are normally sterile, or
       2 - those sites that are normally nonsterile, or
       3 - those sites from which the sterility of cultures 
	   depends upon the method of collection
  Select one of the above
  ++** %33%*

[Adding URINE to OTHERSITES]
.STARTCOM;
{Here we begin to encounter the interactions with other data structures.  The
first problem is to find out whether the site is to be considered sterile,
nonsterile, or indeterminant, so that it can then be added to the proper
internal collection of sites.}
.ENDCOM;

[Adding URINE to ALLSITES]
.STARTCOM;
{The schema also indicates that there is an internal list (called ALLSITES)
that
contains all the sites.  The system adds the new site to it without having to ask.}
.ENDCOM;

  Please indicate the infections which can occur at the 
  site: URINE.  The answer should be in the form of 0 or 
  more of the following-
           <an infection name>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3ACUTE-PYELONEPHRITIS%*
  ++** %3PYELITIS%*
  ++** %3URETHRITIS%*
  ++** %3CYSTITIS%*
  ++**
.STARTCOM;
{A third data structure keeps track of which infections occur at which sites.}
.ENDCOM;

  Should this new item be added to -
         - those sites for which a positive culture implies a 
	   specific infection
  ++** %3N%*
.STARTCOM;
{A fourth contains the list of all sites for which a positive culture
implies a specific infection.}
.ENDCOM;

  Please indicate the methods of collection generally
  associated with a culture taken from the URINE, and the
  likelihood that the culture will be sterile.
  The answer should be in the form of 0 or more of the 
  following-
    <the method of collection of a culture> <certainty factor>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3VOIDED .7%*
  ++** %3FOLEY-CATH-ASPIRATION 1.0%*
  ++** %3FOLEY-BAG-DRAINAGE .9%*
  ++** %3URETERAL 1.0%*
  ++** 
.STARTCOM;
{The fifth keeps track of the various ways samples can be collected
at different sites and notes their likely sterility.}
.ENDCOM;

  What are the likely pathogens to be found at the site: URINE?
  The answer should be in the form of 0 or more of the 
  following-
      <identity of an organism> <certainty factor>
  [Type 1 set to a line, then an empty line when done.]
  ++** %3E.COLI .3%*
  ++** %3PSEUDOMONAS .2%*
  ++** %3ENTEROCOCCUS .2%*
  ++** %3PROTEUS .2%*
  ++** %3KLEBSIELLA .1%*
  ++** 
.STARTCOM;
{And the last keeps track of which organisms are typically found at specific
sites. Note that all of this information is required for every new site that is
added to the system.}
.ENDCOM;

  Ok, done with URINE now...
  Back to the rule.
.STARTCOM;
{Having finished with the new site, && returns to parsing the new rule.}
.ENDCOM;

This may take a few moments.

.END "TRACE"
.SKIP TO LINE 1; SS(Knowledge about representations:# Organization,KARO:);
	We have emphasized earlier that performance of the acquisition
system (&&) depends on an extensive store of knowledge about the
representations used by the performance program ({SYSTM MYCIN}).  There are
several kinds of information involved ({YONFIG KLEVELS} below).  First, a
.ind generalization hierarchy
generalization hierarchy links the schemata together, indicating what
categories of data structures exist in the system and the relationships
between them.  Second, there are the individual schemata, each indicating
the structure and interrelationships of a single type of data structure.
Finally, there are the "slotnames" (and associated structures) from which
the schemata are built; these offer knowledge about specific conventions at
the programming-language level.  Each of these supplies a different sort of
information; together they compose an extensive body of knowledge about the
structure and organization of the representations.
.STARTFIG;

schema hierarchy  %1-- indicates categories of representations and%A
		  %1    interrelations%A
individual schema %1-- describes structure of a single representation%A
slotnames         %1-- the schema building blocks, describe implementation%A
		  %1    conventions

.FIG (Types of knowledge about representations,KLEVELS:);
.ENDFIG;CONTINUE;

.SSS(The schema hierarchy,SHIERARCHY:);
	The schemata are organized into a generalization hierarchy that has
several useful properties.  Part of the hierarchy for the current
performance program is shown in the figure below.∪∪The schemata for
%2blank%*, %2advice%*, %2slotname%*, and the remainder of the primitives in
{YON2 BKGNDRHLL} each form a branch of the network one level below
%AKSTRUCT-SCHEMA%*.  They are omitted here for simplicity.∪
	%AKSTRUCT-SCHEMA%* (knowledge structure) simply provides a root for
the network; its schema is empty.  Below it are the schemata for value and
attribute, and each of these is further subdivided into more specific
schemata.  The right branch of the network illustrates the fact that a
schema can have more than one parent.
.STARTFIG;
		 KSTRUCT-SCHEMA

   VALUE-SCHEMA			  ATTRIB-SCHEMA


SITE-       IDENT-      	   
SCHEMA      SCHEMA


              PTATTRIB-   INFATTRIB-   CULATTRIB-   ORGATTRIB-
	      SCHEMA	  SCHEMA       SCHEMA	    SCHEMA




		      SVA-SCHEMA  MVA-SCHEMA  TFA-SCHEMA

.FIG(Part of the schema hierarchy,SHIER:);
.ENDFIG;
	The major contribution of the hierarchy is as an organizing
mechanism that offers a convenient overview of all the representations in
the system.  It also indicates their global organization.  The right branch
above, for instance, indicates that there are two different breakdowns of the
set of attributes:# one containing four categories,∪∪The attributes can be
classified according to which object they are an attribute of (e.g., patient,
infection, culture, organism).∪ the other containing three
categories.∪∪They can also be broken down into "single-valued,"
"multiple-valued," and "true/false" types.  Single-valued attributes can
have only one value known with certainty (e.g., an organism can have only a
single identity that has a CF of 1.0), while multiple-valued attributes can
have more than one (e.g., there may be more than one drug to which the
patient is definitely allergic).  The final category contains attributes
that ask questions answered by "yes" or "no" (e.g., "Did the organism grow
in the aerobic bottle?").∪ As will be illustrated further on, acquisition
of a new instance of a conceptual primitive is, in part, a process of descent
through this hierarchy, so it provides a useful structuring of the
acquisition dialog.
	Since the acquisition of new types of conceptual primitives is
viewed as a process of adding new branches to this network, it is important
that network growth be reasonably smooth and convenient.  Later sections
will demonstrate that it does, in fact, arise as a natural part of enlarging
the knowledge base and that this new growth is automatically reflected
afterward in future dialogs.
.ind inheritance of properties
	In the network, extensive use is made  of the concept of inheritance
of properties.  The left branch above, for instance, indicates that culture
site and organism identity are more specific categories of the data type
%AVALUE%*.  All of the characteristics that site and identity have in
common as %AVALUE%*s are stored in the %AVALUE-SCHEMA%*.  Thus the
structure description part of the %AVALUE-SCHEMA%* (shown in the next
section) describes the structural components that are common to all
%AVALUE%*s.  The network then branches at this point because an organism
identity is a different type of data structure than a culture site, and
differs in some details of structure.  As the next section  illustrates,
this inheritance of properties is used for all the different types of
information stored in the schema.
	This hierarchical distribution of information also offers some
handle on the issue of the level of abstraction at which data types are
described, since the hierarchy stores at each level only those details
relevant to that particular level.∪∪The schema hierarchy can also
be seen as a data structure version of the sort of hierarchy often
represented with the %2class%* construct in {PRLANG SIMULA} [[Dahl70].∪
	While it is not evident from the segment of the schema network
shown above, functions constitute a branch of the network. Included there,
for instance, are the predicate functions used in rules.  We are thus
viewing functions as another type of data structure.  Restated in 
{PRLANG LISP} terms, a function is "just" another data structure that
happens to have an item on its property list called %AEXPR%* (where the
definition is stored).  As will become clear, it is at times useful to take
this view, but it is not in any sense exclusive.  Functions will be viewed
as both data structures and procedures, depending on which is the most
relevant at the moment.

.SSS(Schema organization,SCHO:);
	The schemata are the second of the three kinds of knowledge about
representations noted in {YONFIG KLEVELS}.  Each contains several different
types of information:
.BEGINLIST;
	(a)\the structure of its instances,
	(b)\interrelationships with other data structures,
	(c)\a pointer to all current instances,
	(d)\inter-schema organizational information, and
	(e)\bookkeeping information.
.ENDLIST;
	{YONFIG SCHEMA} shows the schema for the value of an attribute and
the schema for the identity of an organism.  In both, information
corresponding to each of the categories listed above is grouped together
(the numbers at the right are for reference only).
	Note that, since the %AVALUE-SCHEMA%* is the parent of the
%AIDENT-SCHEMA%* in the hierarchy, information in the former need not be
reproduced in the latter.  Hence the complete specification for an organism
identity is given by considering information in both schemata.
	Note also that the schema use what is known as an "item-centered"
factorization and indexing of knowledge.  That is, the items  dealt
with (%AIDENT%*ities, %AVALUE%*s, etc.) are used as the main indexing
points for the body of knowledge about representations, and all information
about a particular item is associated directly with that item.  The
advantage of this approach lies in making possible a strongly modular system
in which it is relatively easy to organize and represent a large body of
knowledge.  The twenty-five or so schemata that make up that body of knowledge
encode a significant amount of information about the representation conventions
of a large and complex program.  They were reasonably easy to construct because
the individual representations are "mostly independent" (i.e., they have only
a few, well-specified kinds of interactions) and because the item-centered
organization encourages taking advantage of that modularity.

.SSSS(|Instance structure|);
	The part of the schema that describes the structure of its
instances (lines 1-7, 15-20)
is the element that corresponds most closely to an ordinary
record descriptor.  The current implementation takes a very simple view of
{PRLANG LISP} data structures.  It assumes that they are composed of a
print name, a value, and a property list, with the usual conventions for
each:# The print name is a single identifier by which the object is named,
the value is an atom or list structure, and the property list is composed
of property-value pairs.  The first three items in the first schema above
deal with each of these in turn.
	Each item is expressed as a triple of the form:
.STARTCENTERIT(A);
<slotname>  <blank>  <advice> 
.ENDCENTERIT;
(We use the term "slot" from the work on frames [[Minsky74] since the
concept is similar, but the schemata grew out of, and are fundamentally an
extension of, the idea of a record structure).  For the print name of any
value of an attribute, then, the %2slotname%* is %APNTNAME%*, the
%2blank%* is %AATOM%*, and the %2advice%* is %AASKIT%*.∪∪All symbols in the
schemata are purely tokens.  They were chosen to be mnemonic, but no
significance is attached to any particular name, and nothing depends on the
use of the particular set of names chosen.∪
.skip to line 1; STARTFIG;SELECT 6;TURN ON "→↓_";GROUP SKIP 6;
 ↓_VALUE-SCHEMA_↓
     PNTNAME       ATOM      ASKIT						→[1α 
     VAL           PNTNAME   INSLOT						→[2α 
     PLIST         [(INSTOF  VALUE-SCHEMA                           GIVENIT	→[3α 
                     DESCR   STRING                                 ASKIT	→[4α 
                     AUTHOR  ATOM                                   FINDIT 	→[5α 
                     DATE    INTEGER                                CREATEIT)	→[6α 
                    CREATEIT]							→[7α 

     STRAN         the value of an attribute					→[8α 
     FATHER        (KSTRUCT-SCHEMA)						→[9α 
     OFFSPRING     (IDENT-SCHEMA  SITE-SCHEMA)					→[10

     DESCR         the VALUE-SCHEMA describes the format for 
		   a value of an attribute	 				→[11
     AUTHOR        DAVIS							→[12
     DATE          1115								→[13
     INSTOF        (SCHEMA-SCHEMA)						→[14
. APART; <<xgenlines← xgenlines+1;>> COMMENT COMPENSATE FOR FONT 6;




.GROUP;
 ↓_IDENT-SCHEMA_↓
     PLIST         [(INSTOF  IDENT-SCHEMA                           GIVENIT	→[15
                     SYNONYM (KLEENE (1 0) < ATOM >)                ASKIT	→[16
                     AIR     (KLEENE (1 1 2) <(AIR-INST CF-INST)> ) ASKIT	→[17
                     GRAM    GRAM-INST                              ASKIT	→[18
                     MORPH   MORPH-INST                             ASKIT	→[19
                    CREATEIT]							→[20

     RELATIONS     ((ADDTO (AND* ORGANISMS)))					→[21

     INSTANCES     (ACINETOBACTER ACTINOMYCETES  ...  XANTHOMONAS YERSINA)	→[22

     STRAN         the identity of an organism					→[23
     FATHER        (VALUE-SCHEMA)						→[24
     OFFSPRING     NIL								→[25

     DESCR         the IDENT-SCHEMA describes the format for an organism	→[26
     AUTHOR        DAVIS							→[27
     DATE          1115								→[28
     INSTOF        (SCHEMA-SCHEMA)						→[29


.FIG(Two schemata,SCHEMA:);
.ENDFIG;   xgenlines← xgenlines+1;  COMMENT COMPENSATE FOR FONT 6;
.SKIP TO LINE 1;
	The %2slotname%* labels the "kind" of thing that fills the
%2blank%* and provides access to other information that aids in the
knowledge transfer process.  Slotnames are the conceptual primitives around
which representation-specific and representation-independent knowledge in
the system is organized.  All of the semantics of a print name, for
instance, are contained in the %APNTNAME%* slot and the structures
associated with it (described  in {YON2 SLOTN}).
	The %2blank%* specifies the exact format of the information
required.  A translated form of it is printed out when requesting
information from the expert and is then used to parse his response and
insure its syntactic validity.  The blank has a simple syntax but can
express a range of structures.  The term %AKLEENE%*, for instance, is taken
from the Kleene star and implies a repetition of the form within the
angle brackets.  The parenthesized numbers that follow it indicate the
typical, minimum, and maximum number of occurrences of the form. The
appearance of a term of the form %A<datatype>-INST%* indicates some
instance of the %A<datatype>-SCHEMA%*.  Thus,
.STARTCENTERIT(A);
(KLEENE (1 1 2) <(AIR-INST CF-INST)>)
.ENDCENTERIT;
from the identity schema above indicates that the aerobicity of an organism is described
by 1 or 2 lists of the form %A(<aerobicity> <certainty-factor>)%*.
	The %2advice%* suggests how to find the information.  Various
sorts of information are employed in the course of acquiring a new concept from
the expert.  Some of it is domain specific (e.g., the gramstain of a new
organism) and clearly must be supplied by the expert.  Other parts of it are
purely representation specific.  These should be supplied by the system itself,
not only because they deal with information that the system already has (and
therefore should not have to ask), but because the expert is assumed to know
nothing about programming.  Even a trivial question concerning internal data
structure management would thus appear incomprehensible to him.  The
%2advice%* provides a way of expressing instructions to the system on where to
find the information it needs.  There are five such instructions that can be
given.
.STARTFIG;
      ASKIT      %1ask the expert%A
      CREATEIT   %1manufacture the answer%A
      FINDIT     %1the answer is available internally, retrieve it%A
      GIVENIT    %1use the contents of the blank as is (like %6QUOTE%1 in {PRLANG LISP})%A
      INSLOT     %1use the contents of the slot indicated
.ENDFIG;
	The first triple in {YONFIG SCHEMA} (line 1) indicates then that the print name
is an atom and that it should be requested from the expert.  The second
(line 2) indicates that the organism name should evaluate to its print name, and
the third (lines 3 - 7) indicates the form of the property list.  Note that the
%2blank%* for the last of these consists of, in turn,  a set of
%2slotname-blank-advice%* triples describing the property list.

.SSSS(Interrelationships);
	A second main function of the schema is to provide a record of the
interrelationships (line 21) of data structures.  The %ARELATIONS%* slot contains
this information, expressed in a simple language for describing data
structure relationships.  In BNF terms, it looks like:
.STARTFIG;
   <update>    =  ( <command> ( <switch> <structure>%B+%*)%B+%*)
   <command>   =  ADDTO | EDITFN
   <switch>    =  AND* | OR*  | XOR* | (<switch> <structure>%B+%*)
   <structure> =  <any data structure or function name>
.ENDFIG
.continue
(The superscript "+" means "one or more.")# %AADDTO%*
indicates that some other structure in the system should be told about the new
instance, while %AEDITFN%* indicates that some function may need to be edited
as a result of creating the new instance.  The three switches indicate that the
action specified by <command> should be taken on all (%AAND*%*), 1 or more
(%AOR*%*), or exactly 1 (%AXOR*%*) of the structures that  follow.  
In the
case of a new organism, the update is a simple one, and its name is added to the
structure called %AORGANISMS%*.
	The recursive definition allows construction of conditional
expressions, as in the %ARELATIONS%* information in the schema for a
culture site:
.STARTFIG;
   ((ADDTO (XOR* STERILESITES NONSTERILESITES OTHERSITES))
    (ADDTO (AND* ALLSITES SITE-INFECT))
    (ADDTO (OR*  PATHOGNOMONIC-SITES))
    (ADDTO ((OR* NONSTERILESITES OTHERSITES) PATH-FLORA))
    (ADDTO ((AND* OTHERSITES) METHOD))))
.ENDFIG;
.continue
Here, the first three tasks are straightforward, but the fourth line
indicates that if the site is either nonsterile or indeterminant then it should
be added to the structure called %APATH-FLORA%*.  The last line indicates that
all indeterminant sites should be added to the structure called %AMETHOD%*.
	The key point here is to provide the system architect with a way of
making explicit all of the data structure interrelationships upon which his
design depends.  The approach we use differs slightly from the one more
typically taken, which relies on a demon-like mechanism that uses the full
power of the underlying programming language.  We have avoided the use of
an arbitrary body of code and emphasized instead the use of a task-specific
high-level language.
	This formalization of knowledge about data structure
interrelationships has several useful applications.  First, since the
domain expert cannot, in general, be expected to know about such
representation conventions, expressing them in machine-accessible form
makes it possible for && to take over the task of maintaining them.
Second, having && attend to them insures a level of knowledge base
.ind knowledge base integrity
integrity without making unreasonable demands on the expert.  Finally, it
keeps knowledge in the system accessible since the %ARELATIONS%* make
explicit the sort of knowledge that is often left implicit, or which is embedded in
code and hence is inaccessible.  There are several advantages to this
accessibility of knowledge.  For example, by adding to && a simple analyzer
that could "read" the %ARELATIONS%*, a programmer could ask questions like
%2What else in the system will be affected if I add a new instance of this
data structure?%* or %2What are all the other structures that are related
to this one?%* This would be a useful form of on-line documentation.∪∪This
is the data structure analogue for the facility of {PRLANG INTERLISP}
called MASTERSCOPE, which can analyze a set of function definitions and
answer questions like %2Who calls function F?, Which function binds X?%*,
etc.∪
.ind documentation
	There are additional advantages that will become apparent
in {YON3 SFNDSI}, which describes how the updating is actually performed.

.SSSS(Current instances);
	Each schema keeps track of all of its current instances (line 22),
primarily for use in knowledge base maintenance.  If it becomes necessary
to make changes to the design of a particular representation, for instance,
we want to be sure that all instances of it are modified appropriately.
Keeping a list of all such instances is an obvious but very useful
solution.

.SSSS(Organizational information);
	%AFATHER%* indicates the (more general) ancestors of this schema in
the hierarchy and %AOFFSPRING%* indicates its more specific offspring
(lines 9-10).  %ASTRAN%* (line 8) is an English phrase indicating what sort
of thing the schema describes and is used in communicating with the expert.

.SSSS(Bookkeeping information);
	Much the same sort of bookkeeping information (lines 11-14)
is maintained for
each data structure as is kept for rules; %ADESCR%*iption, %AAUTHOR%*,
and %ADATE%* are the analogous items. %AINSTOF%* is the inverse of
%AINSTANCES%* and indicates which schema was used to create this data
structure.
	Note that in the current example it is the organism schema itself
that is being described by all of this bookkeeping information, and, as
shown, it is an instance of the %ASCHEMA-SCHEMA%* (described in {YON1 KAKAR}).
.SKIP TO LINE 1; SSS(Slotnames and slotexperts,SLOTN:);
	The most detailed knowledge about representations is found in the
slotnames and the structures associated with them. They deal with aspects of the
representation that are at the level of programming-language constructs and
conventions.  The overall structure of a slotname is shown below.
.STARTFIG;TURN ON "↓_";
    ↓_<slotname>_↓

  PROMPT  %1an English phrase used to request the information to fill the slot%A
  TRANS   %1an English phrase used when displaying the information%A
	  %1  found in the slot%A
  EXPERT  %1the name of the slotexpert%A

.FIG(Information associated with a slotname);
.ENDFIG;
The %APROMPT%* and %ATRANS%* are
part of the simple mechanism that makes the creation of a new data structure an
interactive operation.  The former is used to request information, the
latter is used when it is necessary to display information that has
previously been deposited in a slot.∪∪The
idea of a %APROMPT%* and %ATRANS%* were adapted from work in [[Shortliffe76].∪
	Associated with each slotname is a procedure called a %2slotexpert%*
(or simply, %2expert%*).
It serves primarily as a repository for useful pieces of knowledge
concerning the implementation of the representations.  For example, names of data
structures have to be unique to avoid confusion or inadvertent mangling.  Yet,
in knowledge acquisition, new data structures are constantly being created and
many of their names are chosen by the user.  Part of the task carried out by the
%2expert%* associated with the %APNTNAME%* slot is to assure this uniqueness.
	The slotexperts are organized around the different sorts of advice
that
can be used in a slot. Their general format is shown below.  Since not all
pieces of advice are meaningful for all slotexperts, in general not every
slotexpert has an entry for every piece of advice.
.STARTFIG;
 (<slotexpert> [LAMBDA (BLANK ADVICE)
		       (SELECTQ ADVICE
			       (ASKIT    %5α#α#α#%*)
			       (CREATEIT %5α#α#α#%*)
			       (FINDIT   %5α#α#α#%*)
			       (INSLOT   %5α#α#α#%*)
			       (GIVENIT  %5α#α#α#%*)    %1etc.%*])
.LONGCAPFIG(The structure of a slotexpert, |%ASELECTQ%* can be thought of as 
. a %2case%* statement for
. symbolic computation. Thus the code above is equivalent to %2if ADVICE = ASKIT
. then ... else if ADVICE = CREATEIT then ...%* etc.|);
.ENDFIG;
	The individual chunks of code that make up the parts of the
%2expert%*s are the smallest units of knowledge organization in our
framework.  They embody knowledge about things like where to find or how to
create the items needed to fill the %2blank%* for a particular slot.  For
instance, we noted that the %2expert%* associated with the %APNTNAME%* slot
insures the uniqueness of names that are supplied by the user.  This
routine would be found in the %AASKIT%* section of the %2expert%*.  Code in
the %ACREATEIT%* section uses a number of heuristics that help to generate
print names that are between 4 and 10 characters long and that are reasonably
mnemonic. This is used when the system itself creates a name for a new
internal data structure.∪∪While the slotnames are currently globally unique
(the %ASYNONYM%* slot in {YONFIG SCHEMA}, for instance, is presumed to mean
the same thing for all types of data structures), this is not critical to
the formalism.  Slotnames could easily be made local to a given schema, and
the schema name would become another index in the knowledge organization
framework.  Thus, instead of indexing the knowledge in the slotexperts by
slotname and advice, we would index by schema name, slotname, and advice.
The power and limitations of the framework would remain unchanged.∪ 
.SKIP 1;
	Recall that we set out to describe representations in order to make
possible the interactive acquisition of new conceptual primitives.  The
slotname and associated expert organize the knowledge needed and provide
the English to make the operation interactive.  The blank provides an
indication of the format of the answers to questions and a check on their
syntax. The advice allows the embedding of an additional sort of knowledge
that makes the process function efficiently and "intelligently."

.SSSS(|Slotnames as data structures, "circularity" of the formalism|);
	While discussing the use of a typed language, it was noted that
everything in the system should be an instance of some schema.  One implication
of this was that both the schemata and their components were themselves 
considered extended data types.  Evidence of this can be seen in the slotnames.
There is a %ASLOTNAME-SCHEMA%* that describes the structure of a slotname
and  makes  it possible to acquire  new slotnames interactively.
	One of the consequences of this approach is a circularity in the
definitions of the data types.  For instance, %ADESCR%*iption is a slotname
and, hence, an instance of the %ASLOTNAME-SCHEMA%*.  But part of the structure
of every slotname is a %ADESCR%*iption specifying what that slotname
represents.  Hence, there is a %ADESCR%*iption of %ADESCR%*iption.
Similarly, in acquiring a new slotname, the system requests a prompt for it,
using the prompt for %APROMPT%*: %2Please give me a short phrase which can be
used to ask for the contents of this slot%*.  This circularity is a result of
the systematic application of the use of the extended data types and makes
possible the sort of bootstrapping behavior demonstrated later in this chapter.
.SKIP TO LINE 1;  SS(Knowledge about representations:# Use,KARU:);
	{YON1 KARO} described the organization and content of the knowledge
about representations embodied in the schemata and associated structures.
This section describes how that information is used; in particular, the way
it enables the expert to teach the system about new conceptual primitives.
Other uses (e.g., for information storage and retrieval) are also
described.

.SSS(Schema function:# Acquisition of new instances,SFANI:);
	We begin at the point where some schema in the network has been
selected as a starting point ({YON2 WTS} discusses how this decision is made).
Since information is distributed through the schema network, the first step
is to get to the root, keeping track of the path while ascending.  &&
"climbs" up the %AFATHER%* links, marking each schema along the way.∪∪If it
encounters a schema that has multiple parents, it jumps directly to the
network root.  This is a sub-optimal solution; a better approach would have
a more sophisticated treatment of the network. It might, for instance, be
able to recognize the situation in which all the parents had a common
"grandparent" and thus jump only two levels (over the ambiguous section),
rather than straight to the root.∪
The system eventually arrives at the root, with all or some part of the
path marked back down to a terminal schema.  (Parts may be unmarked either
because it jumped over non-unique parents or because the starting point
chosen was not a terminal of the network.  The latter case would arise if,
for instance, && knew only that the expert wanted to create a new kind of
value but was not able to discover which type.)
	The next step is to descend back down the network along the marked
path, using each schema along the way as a further set of instructions for
acquiring the new instance.  If the process encounters a part of the path
that is not marked, the expert's help is requested.  This is done by
displaying the English phrase (the %ASTRAN%*) associated with each of the
%AOFFSPRING%* of the current schema and asking the expert to choose the
one which best describes the item being constructed.
	At each node in the network the acquisition process is directed by
a simple "schema interpreter" whose control structure consists of three
basic operations: 
.beginlist;
(a)\Use the structure description part of the schema to guide the addition
of new components to the instance,
.SKIP 1;
(b)\attend to any updating according to the information specified in the
%ARELATIONS%*, and
.SKIP 1;
(c)\add the new item to the schema's list of instances.∪∪For the
sake of efficiency, only schemata at the leaves of the network keep track
of instances.  Each new item carries a record of its path through the
network (in its %AINSTOF%* property); this allows disambiguation when a
schema has more than one parent in the network.∪
.endlist;

.SSSS(Adding to the structure of the new concept);
	The process of adding new components to the new instance involves
filling in slots, as guided by the information provided in the %2blank%*.
Computationally, the process involves sending the %2blank%* and %2advice%*
as arguments to the appropriate slot expert:{FOOTNOTEPREFACE}
.SEND FOOT ⊂ FOOTNOTESEND
%AAPPLY*%* applies its first argument to its remaining arguments.  Thus,
.ONCE CENTER; SELECT A;
(APPLY* (QUOTE CONS) (QUOTE A) NIL) = (A)##.
. ⊃;
.STARTCENTERIT(A)
(APPLY* (GETEXPERT <slotname>) <blank> <advice>)
.ENDCENTERIT;
The segment of code in the %ASLOTEXPERT%* associated with the %2advice%*
then determines how to go about filling in the blank.  For
example, when that %2advice%* is %AASKIT%*, the expert is
consulted.  As we have seen, this appears to the expert as a process of
supplying information in a form specified by the system:# && first prints a
"translated" version of the %2blank%* to guide the expert, then uses the same
%2blank%* to parse his response.
	This approach makes possible a particularly simple form of "schema
interpreter."# For instance, the part of the interpreter that handles this
addition of new substructure is just the single line of code shown above.  The
task of filling in the blank is thus handed off to the appropriate %2expert%*.
The %2expert%*, in turn, hands it to the segment of code associated with the
indicated piece of %2advice%*.  That code may, in turn, request each part of
the %2blank%* to supply a "translation" of itself for display to the user.
Thus, rather than trying to write a clever interpreter that had a lot of
information about each representation, we have instead written a simple interpreter
and allow the representations themselves to supply the information.
.ind human engineering
	There are also several human engineering features available when
the advice indicates that the information to fill the slot should be
requested from the expert.  We have seen the use of the %2blank%* in guiding
the expert and in parsing his answer.  There is also the ability to display a
sample answer (in response to a "?" ) or all legal answers (in response to
a "??").  All of these help to make the interaction relatively painless.

.SSSS(Attending to data structure interrelations,SFNDSI:);
	The next step--dealing with necessary updates to other
structures--relies on the information specified in the %ARELATIONS%* slot.  The basic idea
is to consider this information as a list of potential updating tasks to be
performed whenever a new instance of the schema is acquired.
	Maintaining existing interdependencies of data structures in
the face of additions to the system requires three kinds of information:
.BEGINLIST;
	(1)\What other structures might need to be updated in response to the new addition?
	(2)\If those other structures are not all independent, what interrelationships exist between them?
	(3)\What effect should the new addition have on each structure?
.ENDLIST;
	As an example, consider the acquisition of the new culture site shown earlier.
The first updating task encountered is the decision whether to add the
new site to the collection of sterile, nonsterile, or indeterminant sites.  We
.ind trigger
.ind target
describe this by saying that the data type %ASITE%* is the "trigger" for an
action that may need to be performed on one or more "targets."  In these
terms, the targets are the answers to question 1 above (other structures that
may need to be updated), and the fact that the categories are mutually exclusive
is the answer to question 2 (the constraints on the effects).  Information
needed to answer question 3 (the effect on each target)
may come from two sources.  First, the data type and organization of the target
is always relevant.  A partially ordered list, for example, will be updated one
way, while a set will be updated in another.  Second, the trigger may or may not
carry relevant information.  In the example above, it does not.  Adding a new
site to any one of the three categories requires no information about the site
itself; the system need only know to which target it should be added.  The
approach used here is particularly well suited to this situation (in which the
trigger does not determine the effect on the target) and takes advantage of it
by minimizing the distribution of the required knowledge (this point will be
clarified below).
	The "language" of the %ARELATIONS%* is
a syntax of data structure interrelationships and provides a way of
expressing the answers to questions 1 and 2.  For the current example, part of
the %ARELATIONS%* of the %ASITE-SCHEMA%* is
.STARTCENTERIT(A);
(XOR* STERILESITES NONSTERILESITES OTHERSITES)
.ENDCENTERIT;
which indicates which structures are potentially affected and the
constraint of mutual exclusion.  The information for question 3 is supplied
.beginind updating function
by updating functions (described below), which are included in some of the
schemata.

	One example of how the updating process works will make all of this
clearer and illustrate the advantages it presents. The first step is to
determine which structures should actually be updated.  If the %A<switch>%* is
%AOR*%* or %AXOR*%*, the expert's help is requested;∪∪Recall that all data
structures in the knowledge base have associated with them a descriptive
English phrase (the %ADESCR%* part) supplied during the acquisition
process.  It is this description that allows && to "talk" about various
data structures.∪ otherwise (%AAND*%*, or a recursive definition), the
system itself can make the decision.  In this case the system displays the
three choices (%2sterile, nonsterile, %*and%2 indeterminant%*) and asks
the expert to select one.
	The rest of the process can best be viewed by adopting the
perspective of much of the work on "actors" [[Hewitt75] and the 
{PRLANG SMALLTALK} language [[Learning76], in which data structures are
considered active elements that exchange messages.  In these terms, the
next step is to "send" the new culture site to the target selected
(%AOTHERSITES%*), along with the command to the target to "Add this to
yourself."# The target "knows" that knowledge about its structure is stored
with the schema of which it is an instance, so it finds a way to pass the
buck:# It examines itself to find out which schema it is an instance of
(i.e., it examines the contents of its %AINSTOF%* slot).
Determining that it is an instance of the schema for alphabetically ordered
linear lists (the %AAOLL-SCHEMA%*),  it sends a request to this schema,
asking the schema to take care of the "add this" message.
	Recall that the schema is a device for organizing a wide range of
information about representations.  Part of that information indicates how
to augment existing data structures.  The %AAOLL-SCHEMA%* (like others) has
an "updating function" capable of adding new elements to its instances
(alphabetically ordered linear lists) without violating their established
order.  Thus, in response to the request from %AOTHERSITES%*, the
%AAOLL-SCHEMA%* invokes its updating function on the new culture site and
the list %AOTHERSITES%*, adding the new element to the list in the proper
place.
	To review:
.BEGINQUOTE;
	%ASITE-SCHEMA%* asks the expert if the new site is %2sterile%*, %2nonsterile%*, or %2indeterminant%*.
	The expert indicates %2indeterminant%*.
	%ASITE-SCHEMA%* sends the new site to %AOTHERSITES%*, with the message "Add this to yourself."
	%AOTHERSITES%* examines itself, finds it is an instance of %AAOLL-SCHEMA%*,
and sends a message to %AAOLL-SCHEMA%* saying "Add this new site to me."
	The updating function associated with %AAOLL-SCHEMA%* adds the new site to %AOTHERSITES%*.
.ENDQUOTE;
.ind distribution of knowledge
	There are several advantages to the  distribution of
knowledge this technique employs.  To make them clear, consider the generalized
view of the process shown in {YONFIG TDIRS}.  Shown there is one trigger (%ASITE%*), three
.beginind traffic director
targets, and a structure called a "traffic director" (which is a generalized
version of the %ARELATIONS%*).  In this view, each schema would have its own
traffic director that tells it what to do with new instances.  The basic issue
is organization of knowledge and, in particular, how that knowledge should be
distributed between the updating functions and the traffic director.
.STARTFIG;
.BOXFIG;
  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃     ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃     ⊂∂∂∂∂∂∂∂∂∂∂∂∂⊃
  }   updating   }     }    updating     }     }  updating  }
  }  function-1  }     }   function-2    }     } function-3 }
  ε ∂ ∂ ∂ ∂ ∂ ∂ ∂λ     ε ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ λ     ε ∂ ∂ ∂ ∂ ∂ ∂λ
  }   target-1   }     }    target-2     }     }  target-3  }
  }              }     }                 }     }            }
  } STERILESITES }     } NONSTERILESITES }     } OTHERSITES }
  α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂$     α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$     α%∂∂∂∂∂∂∂∂∂∂∂∂$




		       ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
		       } traffic director }
		       ε ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂λ
		       }   SITE-SCHEMA    }
		       α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
.WIDECAPFIG(Generalized view of attending to data structure interrelations.,TDIRS:);
.ENDFIG;
	In the current example, a new culture site is "sent" to
the traffic director for instructions.  It might receive three kinds of
directions, depending on how much information is stored there.  The traffic
director might know:
.BEGINLIST; INDENT 0,15,0; TABS 16;
	DESIGN A:\the names of the targets.
	DESIGN B:\the names of the targets and the constraints among them.
	DESIGN C:\the names of the targets, the constraints among them, and the structure
of each target.
.ENDLIST;
	Design C organizes the updating process around each trigger,
.ind demon
corresponding closely to the standard demon-like approach.  In this case the
traffic director can "tell" the new site exactly which target(s) to go to and
how to "add itself" to each. This would mean most of the knowledge is stored in
the traffic director, which has to know  both the organization and the
structure of all current targets.
	Design A organizes the process around the target.  Here the traffic
director can only say "Here's all the places you might (or might not) belong,
try them all and ask when you get there."# In this case the bulk of the
knowledge is stored with the updating functions:# They will be responsible for
adding the new site to the target and for maintaining the necessary interrelations
between the targets.
	Design B is how the %ARELATIONS%* are designed.  In this case the traffic director
can decide exactly which target(s) the new site should be added to, but it does not
know how to add it there.  This time it would say, "Here's where you belong, ask
about how to be added when you get there."
	Now consider the advantages and difficulties associated with each alternative.
	Design A requires that the targets (or the expert) must be sure to maintain
the necessary constraints.  In terms of the current example, this would mean
either including in the updating function for each category of site a test
to insure mutual exclusion with the other two categories (which would be
slow and redundant) or, when
asking the expert about each category of site individually, relying on him to
maintain the requirement of mutual exclusion (which would be slow,
redundant, and less reliable).  The traffic director in design B has enough
information to present the expert with a single coherent picture of the choice
to be made (e.g., asking him to choose just one of the three alternatives),
rather than requiring him to reconstruct it from a sequence of questions.
	Design C has the disadvantage that
adding a new representation to the system would be rather involved, since
describing its traffic director would require keeping in mind the structure of
each target.  In addition, any changes in the structure of the targets would be
harder to accommodate since knowledge about that structure might be widely
distributed among several traffic directors.  With alternative B, all the
necessary changes can be made by editing a single schema.
	Design B  has the advantage that all of the information relevant
to a representation is associated directly with it.  This offers a convenient
framework for its initial acquisition and can mean modifications are easier to
make.  In terms of these alternatives, it can be seen as a compromise between
having the information associated primarily with the trigger (i.e., stored in
the traffic director, as in alternative C) and having it associated primarily
with the targets (i.e., stored in the updating functions, as in alternative A).
.ind distribution of knowledge
In the most general terms, Design B succeeds because %2it keeps the distribution
of information about data structures constrained to the fewest number of locations%*.

	Note that this updating technique is applicable to a wide range of
data structures.  %ASITE-INFECT%*, for instance, is a table with a culture
site labeling each row and an organism identity labeling each column.  The
entry in that row and column is the CF that an infection at site
<rowname> is caused by the organism <columnname>.  A newly acquired
site will eventually be sent to %ASITE-INFECT%* as part of the response to
the updating command:
.STARTCENTERIT(A);
(ADDTO (AND* ALLSITES SITE-INFECT))
.ENDCENTERIT;
In this case, %ASITE-INFECT%* sends a request to the schema of which it is an
instance; this schema then invokes its updating function,
which results in the interaction seen earlier in the trace ("%AWhat are the likely
pathogens to be found at the site: URINE?%*").  The answer is used to create a
new row in the %ASITE-INFECT%* table.
	The caveat mentioned above should be reemphasized.
The current design scheme takes advantage of a degree of modularity in the data
structures.  It is applicable only where target updating is not dependent on
extensive information from the trigger.  That is, the updating functions
of each target in {YONFIG TDIRS} must be able to add new elements to their
targets without knowing which traffic director sent them the new element.  Since
this modularity is not present in all data structure designs, it forms a
limiting factor in the approach.
.endind traffic director
.endind updating function

.SSSS(Noting the new instance);
	The final step in "interpreting" a schema is to add the newly created
structure to the list of %AINSTANCES%* of the schema.  This is done primarily
for bookkeeping purposes, but it also has other useful applications which will be demonstrated
later.


.SSS(Where to start in the network,WTS:);
	The description of the use of schemata to guide acquisition
assumed that the question of where to start in the schema hierarchy had already
been settled.  While the mechanisms used to make this decision are not complex,
they illustrate an interesting issue.
	One mechanism provides a default starting place for the case in which
the user indicates, outside of the context of any consultation, that he wants to
teach the system about some new instance.  (While we have seen the acquisition
of a new value illustrated in the context of rule acquisition, it is also
possible to acquire new instances of any data type as a separate operation.)
Since there is no context to rely on, the default is to start at the root of the
schema network and ask the expert to choose the path at every branch point.
This presents a reasonable dialog since it requests from the expert a
progressively more detailed specification of the concept he has in mind.  
Each individual inquiry will appear sensible since, without
contextual information, there is no way the system could have deduced the
answer.  (In the excerpt below, only the sequence of questions is shown;
everything else has been edited out.)
.STARTFIG;APART;
++** %3?%*
Commands are
	NR - enter a new rule
	ER - edit an existing rule
	DR - delete rule
	NP - enter a new primitive (attribute, value etc.)
++** %3NP%*

Which of the following best describes the new primitive?
[Choose the last if no other is appropriate]
     1 - an attribute, or
     2 - a value of an attribute, or
     3 - None of the above
Choose one
++**%3 1%*
.STARTCOM;
{At this point, acquisition of the new item would begin; it is omitted here.}
.ENDCOM;

Which of the following best describes the new attribute?
[Choose the last if no other is appropriate]
     1 - an attribute of a patient
     2 - an attribute of a infection
     3 - an attribute of a culture
     4 - an attribute of a organism
     5 - None of the above
Choose one
++**%33%*
.STARTCOM;
{Here we would see additional acquisition of information  about the item; again,
omitted.}
.ENDCOM;

Which of the following best describes the new attribute?
[Choose the last if no other is appropriate]
     1 - a single-valued attribute, or
     2 - a multi-valued attribute, or
     3 - an attribute whose value is "true" or "false", or
     4 - None of the above
Choose one
++**%3 3%*
.ENDFIG;
	When a new concept is mentioned during a rule acquisition, however,
there is an extensive amount of context available.  The same sort of default
approach would look "dumb" in this case, since there are numerous clues
indicating which kind of data type is being mentioned.  In the example in
{YON2 NEWORG}, for instance, it was not difficult to discover that the
concept was a new identity of an organism.
As was indicated, this is accomplished by some simple
.ind pattern matching
pattern matching.  Each schema in the network has one or more patterns
associated with it.  For example, the pattern
.STARTCENTERIT(A);
the <attribute> of <object> is --
.ENDCENTERIT;
is associated with the %AVALUE-SCHEMA%*.  Each of these is tested against the
line of text that prompted acquisition of the new item, and the outcome
supplies a starting place in the network.  (If all matches fail, the system
starts as before with the root of the network.)
	The patterns thus make it possible to use contextual information from the
rule acquisition dialog in order to select a starting place in the schema network.  Note
that this link between the natural language dialog and the data type hierarchy
represents part of the semantics of each data type.  Since the schemata were
designed initially to represent only the syntax of the data types, at present
they contain only the very limited and somewhat ad hoc semantic information in
the patterns.  Such information is clearly needed, however, and would
represent a useful and natural extension to the current implementation.  It
would mean that, along with the syntax of each data type, some of its semantics
would be described, perhaps in the form of a more systematic set of patterns
than those currently in use, or other more sophisticated devices.  The system
would then always start at the root of the network and could use the semantic
information stored with each schema to take advantage of context from the
dialog, guiding its own descent through the network.

.SSS(Schema function:# Access and storage,SFAS:);
	The schema concept was introduced by describing it as an extension of the
notion of a record structure, and we have seen how it guides the acquisition of
a new instance.  A second important use of record structures is for access and
storage, and the schemata have the parallel capability.  If slotnames are viewed
as analogous to the fields of a record, then the mechanism used in && looks quite
similar to the standard record %2fetch%* and %2store%* operations.
	Our approach is based on generalizing the use of the %2advice%*
construct by using four additional types of advice (shown below).  To carry out a
storage or access operation, the relevant slotexpert is sent the the name of the
data structure and one of these pieces of advice.
.STARTFIG;
  GETONE  %1retrieve one instance of whatever fills this slot in the%A
	  %1indicated data structure%A
  GETALL  %1retrieve all instances of whatever fills this slot in the%A
	  %1indicated data structure%A
  GETNEXT %1retrieve one new instance at each request%*
  STOREIT %1store an item in the slot of the indicated data structure
.FIG(Four pieces of advice used for access and storage);
.ENDFIG;
.continue; skip
For instance
.STARTCENTERIT(A);
(APPLY* (GETEXPERT AIR) 'E.COLI 'GETONE)
.ENDCENTERIT;
will retrieve one of the aerobicity values of %AE.COLI%*, while
.STARTCENTERIT(A);
(APPLY* (GETEXPERT AIR) 'E.COLI 'GETNEXT)
.ENDCENTERIT;
.ind generator
functions as a generator and will retrieve them all one by one.  Since the
slotexperts are organized around the pieces of advice, the relevant code
for storage or retrieval is similarly organized in each slotexpert.
	This feature, too, has been influenced by the work on actors and
{PRLANG SMALLTALK} noted earlier.  However, where that work concentrates on
issues of programming and program correctness, we intend here nothing
quite so formidable.  We use it because it was a natural extension to our
approach, with much the same perspective on organization of knowledge.
	Our implementation offers, for instance, the well-known benefits
supplied by any record-like structure that provides a level of
.ind insulation
"insulation" between representation and implementation.  The use of
slotexperts and advice turns accessing a structure into a process of
sending a request to the data structure itself, which then "answers" by
providing (or storing) the desired item.  All access and storage is thus
funneled through the individual structures (via the slotexperts), and
explicit reference is made to the configuration of the structure in only
one place in the system.
.ind knowledge organization
As with standard record structures, this technique makes it possible to access  a
data structure without reference to the details of how it is actually
stored and without the need to change the code if the storage
implementation is modified.  In addition, the slotexperts make it easy to
use an arbitrary function for storage and retrieval.  Dates, for example,
are stored in the system as integers (for efficiency), and the
%ADATE-EXPERT%* takes care of decoding and encoding them on access and
storage.

.SKIP TO LINE 1; TRACESEC( Acquiring a new attribute,NATT:);
	A more sophisticated example--the
acquisition of a new attribute--will illustrate several other
aspects of our approach to handling data structures.  It will demonstrate, for
instance, the utility of the schema network as a device for structuring the
acquisition process.  The network is used to organize the dialog and to
insure that
the expert is presented with a comprehensible sequence of questions.  It also
offers a foundation for adding new data structures to the system, making the
task reasonably simple both conceptually and computationally.
	Since an attribute is a more complex
structure than those encountered in previous examples, the acquisition
process is correspondingly more complex.  In particular, the presence of
several different data types in the substructure of the attribute guides the
dialog through a parallel sequence of several different topics.
	The example presents a borderline case for the distinction
drawn earlier between formalism and degree of expertise.  To describe a
new attribute it is necessary to describe the values associated
with it.  It may happen that these values require a data type as yet unknown to
the system, and creating that data type becomes, in turn, a new subproblem.
But describing a new kind of data type is a complex operation that
typically has an impact on the underlying formalism of the performance
program.  The current example is a borderline case because that impact can
be avoided, but it will demonstrate how the problem arises.
	The dialog below is presented in two parts.  After the first half there is
a brief review of what has happened so far and an indication of what is yet to
occur.  Recall that the knowledge base appears as it would during the early
stages of construction.

.BEGIN "TRACE" STARTRACE;

The new rule will be called RULE383
 If:    1 -%3  THE ORGANISM IS A GRAM NEGATIVE ROD%*
   and  2 -%3  PATIENT IS A COMPROMISED HOST%*
   and  3 -%3  THE NUTRIENT OF THE CULTURE MEDIUM IS%*
	   %3  BLOOD-AGAR%*
   and  4 -  
 Then:  1 -%3  THERE IS EVIDENCE (.8) THAT THE CATEGORY IS%*
	   %3  ENTEROBACTERIACEAE%A
   and  2 -  

This may take a few moments...

I don't understand
          THE NUTRIENT OF THE CULTURE MEDIUM IS BLOOD-AGAR
because these words are unknown: NUTRIENT MEDIUM BLOOD-AGAR
.STARTCOM;
{The expert is typing in a new rule and uses several words && doesn't
recognize.  As before, the system indicates its inability to parse one of his
lines and displays the unfamiliar words.}
.ENDCOM;

Would you care to try rephrasing that?
++** %3Y%*
  (Please use *'s to mark what you think might be a new 
   attribute or a new value of an attribute)
        3 -  %3* THE NUTRIENT * OF THE CULTURE MEDIUM IS%*
	     %3  BLOOD-AGAR%*
It looks as though
     THE NUTRIENT
refers to a new attribute, correct?
++**%3 Y%*
 ...is it a new attribute of the culture?
++**%3 Y%*
.STARTCOM;
{Once again, the language is sufficiently simple that with the relevant phrase
marked off, straightforward pattern matching succeeds in determining what the
new item is. The system then retrieves the schema for attributes and begins to
fill it out.}
.ENDCOM;

  Now tell me a few things about this new attribute ...

  If there is one word which neatly sums up "THE NUTRIENT",
  please give 
  it (else press RETURN)
  ++**%3 NUTRIENT%*
.STARTCOM;
{The new data structure needs a name, so the system asks the expert to supply
one. If he declines, it will create one, using the phrase delimited by the asterisks
and a few simple heuristics for creating mnemonic variable names.}
.ENDCOM;

  If someone said WHAT DO YOU MEAN BY NUTRIENT, what would you
  say?
  [type an empty line when done]
  ++**%3 CULTURE MEDIA COME WITH A WHOLE RANGE OF%*
  ++**%3 NUTRIENTS, AND SOME BACTERIA ARE VERY PARTIAL%*
  ++**%3 TO ONE NUTRIENT OR ANOTHER%*
  ++**
.STARTCOM;
{As before, the author of a new concept is responsible for describing it.}
.ENDCOM;

  Please give a phrase with which I can talk about the value 
  of NUTRIENT
  ++**%3 THE NUTRIENT OF THE MEDIUM OF *%*
.STARTCOM;
{All attributes have a phrase associated with them that can be used
to talk about their value.  It is used in many places throughout the system
(e.g., to produce the natural language version of the rules).  The asterisk is filled
in with the name of the relevant object, to produce, for instance, "the nutrient of
the medium of CULTURE-1."}
.ENDCOM;

      Now we want to select some keywords which might be good
      indicators that a speaker was referring to this new
      attribute. There are two classes of keywords (strong and
      weak) that are used to classify each of the non-trivial
      words in the phrase you just typed.
      Should NUTRIENT be a keyword?
      ++**%3 Y%*
      Strong?
      ++**%3 Y%*
      Should MEDIUM be a keyword?
      ++**%3 Y%*
      Strong?
      ++**%3 Y%*
      If you want to add any additional keywords,type them now.
      [type an empty line when done]
      ++**%3 ENERGY-SOURCE%*
      Strong?
      ++**%3 N%*
      ++** 
.STARTCOM;
{As described in chapter 5, natural language
capabilities are based on keywords selected from the English phrases associated
with some of the data types.  Here the dictionary is updated to include
the keywords from the new attribute.}
.ENDCOM;

  Is this attribute the result of a quantitative laboratory
  procedure?
  ++**%3 Y%*
.STARTCOM;
{The answer will be important during consultations:# Recall that for this type
of attribute, the consultation system asks for the value before
attempting to deduce it.}
.ENDCOM;

Which of the following best describes concepts like NUTRIENT
[Choose the last if no other is appropriate]
     1 - a single-valued attribute, or
     2 - a multi-valued attribute, or
     3 - an attribute whose valued is "true" or "false", or
     4 - None of the above
Choose one
++**%3 1%*
.STARTCOM; TURN ON "{";
α{The expert has supplied all the information required by the %ACULATTRIB-SCHEMA%*,
and && now attempts to descend further down the schema network.  As
{YONFIG SHIER} illustrated, however, there is a three-way branch here.  Since the
system has no way of knowing which way to go, it asks the expert, who responds
by further specifying the concept he has in mind.}
.ENDCOM;

  Please give all the legal values for NUTRIENT
  The answer should be in the form of 1 or more of the
  following-   <a value of an attribute>
  ++**%3 BLOOD-AGAR%*
  ++**%3 THAYER-MARTIN%*
  ++** 
.STARTCOM;
{One of the components of an attribute is a list of the values
associated with it.  This is indicated in the relevant schema with the
slotname-blank-advice triple.
.SELECT A; <<so it causes a break>>
.ONCE FLUSH RIGHT <<otherwise it sticks out past left margin for some reason>>
LEGALVALUES   (KLEENE (1) < VALUE-INST >)    ASKIT
.CONTINUE;
which says that an attribute has one or more objects associated with
it that are of type %AVALUE%* and that they should be obtained from the
expert.}
.ENDCOM;

  Now I need to know a few things about each of these values...
  First BLOOD-AGAR
.STARTCOM;
{Since they are new to the system, the expert is asked to describe each of them.
Since the triple shown above indicates that each is an instance of the
%AVALUE%* schema, the system starts by retrieving that schema and filling it
in.}
.ENDCOM;

    Please give the full, formal name for "BLOOD-AGAR"
    ++**%3 BLOOD-AGAR%*
    Now please give all synonyms or abbreviations for 
    BLOOD-AGAR which you would like the system to accept:
    ++** 

Which of the following best describes concepts like BLOOD-AGAR
[Choose the last if no other is appropriate]
     1 - identity of an organism, or
     2 - the site of a culture, or
     3 - None of the above
Choose one
++**%3 3%*
.STARTCOM; 
.TURN ON "{";
α{After finishing with the %AVALUE-SCHEMA%*, the system is again faced with a
choice of paths (see {YONFIG SHIER}) and must ask for advice.  Since two different
attributes can have values in common, there is no guarantee that the values
of this new attribute are unique.  Also, since the phrases like "the site of
a culture" are simply strings associated with each schema, the
system has no way of knowing what each choice represents.  The expert indicates
that the concept he is talking about is new by choosing the third item.}
.ENDCOM;

Ok, then you'll have to tell me a few things about it...

***************************************************************
.SKIP 1;
.END "TRACE"
	Let's take a moment out to review what's happened so far and to see
where we are going from here. (A revised version of {YONFIG SHIER} is reproduced
below for reference.  It includes an indication of the path this example takes
through the network and omits several other network branches for clarity.)
.ind pattern matching
	The system was able to use its pattern-matching routines
to guess from the dialog that the new object being discussed was an
attribute of a culture.  The expert verified this guess and the system
used as its starting point the %ACULATTRIB-SCHEMA%*, since it was the schema
associated with the pattern that matched. From there (indicated in {YONFIG PATH} by the
asterisk), the system "climbed" up one level in the network∪∪Since
the %AKSTRUCT-SCHEMA%* is empty, there is no need to go all the
way to the root.∪ and started
back down.  The first schema to be filled out is the %AATTRIB-SCHEMA%*; this
supplied the direction for the initial part of the dialog. Then, since the path
had been marked (during the ascent), it descended to the %ACULATTRIB-SCHEMA%*
and used that to continue the dialog.
.STARTFIG;
		 KSTRUCT-SCHEMA

   VALUE-SCHEMA			  ATTRIB-SCHEMA
     ??

SITE-       IDENT-      	   
SCHEMA      SCHEMA

  [nutrient-schema]


              PTATTRIB-   INFATTRIB-   CULATTRIB-   ORGATTRIB-
	      SCHEMA	  SCHEMA       SCHEMA [*]   SCHEMA
					 ?



		      SVA-SCHEMA  MVA-SCHEMA  TFA-SCHEMA

.FIG(Part of the schema hierarchy,PATH:);
.ENDFIG;
	At that point, the system encountered a branch in the network for which
it had no directional information (indicated by the single question mark) and
hence had to ask "%AWhich of the following best describes concepts like
NUTRIENT?%*"# The process continued after the expert indicated the correct
choice.
	In filling out this schema, however, the system encountered a link to
another type of data structure. Since each attribute carries with it a
list of its associated values, constructing a new attribute leads to
the acquisition of new values.  As shown in the trace, this is triggered by
encountering
.STARTCENTERIT(A)
LEGALVALUES   (KLEENE (1) < VALUE-INST >)   ASKIT
.ENDCENTERIT;
in the %ASVA-SCHEMA%*.  After listing all of the associated values, the
expert was asked to describe each.  The description task is set up as a
subproblem (indicated by the dashed line) with the starting point in the
network given by the schema named in the triple.
	Consider the description of the first value.  The system started with
the %AVALUE-SCHEMA%*, but then reached a branch point for which it had no
information (the double question mark) and again had to ask the expert
"%AWhich of the following best describes things like BLOOD-AGAR?%*"#
This time, however, the expert indicated that the object being acquired was
of a type not yet known to the system.  The acquisition of a new schema to
describe the new data type is then set up as a sub-subproblem.  
	The next step in the dialog, then,  will be the description of the
new "data type" %ANUTRIENT%*, accomplished by filling out the %ASCHEMA-SCHEMA%*
to produce the %ANUTRIENT-SCHEMA%*.  This will become a part of the network as a
new branch below the %AVALUE-SCHEMA%*. Note that the network thus "evolves,"
growing larger in a relatively smooth and natural way as the number of data
types increases.  In this particular case, almost all of this operation can be
carried out by the system itself, and it is thus almost totally transparent to the
expert.
	The process then returns to the subproblem of
describing the values and continues with %ABLOOD-AGAR%* where it left off. That is,
after describing the new data type (sprouting the new branch of the network),
the process will continue down into that branch and will request a description
of the first instance of the new data type, using the new schema as a guide.
	After all the values have been described, the dialog goes back to finish
up acquisition of the new attribute.  The entire sequence of topics is shown below.
.STARTFIG;
New rule %2acquisition requires a%*
    new attribute %2 which has its associated%*
	  new values.  %2But these are a%*
		new data type %2which means adding a new schema to the%*
			      %2network.%*
	  New values %2then instantiate the new schema, and finally the%*
    new attribute %2is finished up, so the system returns to the%*
new rule.
.ENDFIG;
	Let's continue with the trace, picking it up at the point where the system requested
help in choosing the path to follow below the %AVALUE-SCHEMA%*.

.BEGIN "TRACE" STARTRACE;

Which of the following best describes things like BLOOD-AGAR
[Choose the last if no other is appropriate]
     1 - identity of an organism, or
     2 - the site of a culture, or
     3 - None of the above
Choose one
++** %33%*

Ok, then you'll have to tell me a few things about it...
.STARTCOM;
{&& retrieves the %ASCHEMA-SCHEMA%* to direct the acquisition of
the new data type.}
.ENDCOM;

    Please give a phrase (like the ones above) to describe this 
    new type of value.
    ++** %3THE NUTRIENT OF A CULTURE MEDIUM%*


    Ok, finished describing the new type of value; now back to
    BLOOD-AGAR
.STARTCOM;
{In this case, most of the task of constructing the new schema can be handled by
the system itself, so the dialog returns to the problem of describing the new
instances of the (new) data type.}
.ENDCOM;

  Finished with BLOOD-AGAR; now THAYER-MARTIN
.STARTCOM;
{And there's nothing else to do in describing the first value, so the system
continues with the second.}
.ENDCOM;

    Please give the full, formal name for "THAYER-MARTIN"
    ++** %3THAYER-MARTIN%*

    Now please give all synonyms or abbreviations for 
    THAYER-MARTIN which you would like the system to accept:
    ++** %3CHOCOLATE-AGAR%*
    ++** %3TM%*
    ++** 

    Finished with THAYER-MARTIN
  OK done with the values for NUTRIENT now...
.STARTCOM;
{The second instance is likewise described in short order, and the dialog
returns to the original problem--describing the new attribute.  There
is just one more part to be acquired.}
.ENDCOM;

    Please give a phrase which can be used to ask about the
    value of NUTRIENT
    [Type as many answers as are appropriate, then an empty 
     line.]
    ++** %3WHAT IS THE NUTRIENT OF THE MEDIUM FOR *%*
    ++** 

Ok, finished with describing the new attribute now.
Returning to parsing the new rule.

This may take a few moments...


This is my understanding of your rule:
RULE383
-------
  If:  1) The gramstain of the organism is gramnegative, and
       2) The morphology of the organism is rod, and
       3) The patient is a compromised host, and
       4) The nutrient of the medium of the culture is 
	  blood-agar
  Then:  There is strongly suggestive evidence (.8) that the 
	 category of the organism is enterobacteriaceae


Okay? (Yes or No)
++** %3Y%*
.STARTCOM;
{Finally, we have the new rule, with its new attribute and value.}
.ENDCOM;
.END "TRACE"
.SSS(Comments on the trace);
	The creation of the new attribute is an extensive
operation involving several different data types.  By structuring it properly,
however, the task becomes comprehensible.  There are several sources of
this structuring information.  First, the schemata take advantage of the
decomposability of individual data types to present a series of straightforward,
independent questions.  Next, the schema network relies on the fundamentally
simple organization of the data types to provide a comprehensible sequence of
topics.  The slots and slotexperts, in turn, make it possible to represent many
conventions of the data types in ways that permit the system to perform
many of the routine tasks, considerably simplifying the entire operation.
Finally, the correspondence between data types and objects in the domain makes
it possible to present a dialog that appears comprehensible to the expert, yet
which deals effectively with questions of data structure manipulation.  The result of
all this is the construction of some complex data structures with numerous
internal conventions and interrelationships, in a fashion that makes it a
reasonable task for the expert.
	The growth of the schema network to encompass a new data type
.ind flexibility
demonstrates the degree of flexibility in the system.  The flexibility arises
from the use of the schemata as a language and framework for the specification
of representations.  Knowledge about any specific representation is contained
entirely in the "statements" of that language, rather than in special purpose
code.  This provides a greater range of applicability and flexibility than would
be possible if separate, hand-tailored acquisition routines were written for
each different data type.
	Additional flexibility arises from the inherently extensible nature
of the schema network.  As with all generalization hierarchies, it is a
relatively simple operation to add new branches at any level in the network.
Since the representation language interpreter "reads" the network to structure
the dialog, the addition will be reflected in future acquisition sessions.
That is, the next time && reaches the %AVALUE-SCHEMA%* node and requests
advice about which way to go, it will present the expert with four options:# the
three shown in the previous trace plus the new one of culture medium nutrient.
	The trace also demonstrates that the description of the structure of one
data type may mention another (as the description of an %2attribute%*
mentions %2values%*).  In the acquisition process this gets translated into a new
direction for the dialog, as one topic (describing the new attribute) leads
naturally into another (describing its associated values).  These "new
directions" are currently followed as they arise (i.e., the search is
depth-first).  This can prove to be a distraction at times, since the
dialog goes off on a subtopic and later returns to the main topic to finish
up.  This could easily be changed to a modified breadth-first search, which
would result in a dialog that exhausted each topic (each data structure) in
turn before beginning another.
	One final comment concerns the simplicity of acquiring the new schema that
describes the values of the new attribute.  There are several reasons
why the operation is in this case almost totally transparent to the expert when
in general it is a much more complex operation.  It is in part a fortuitous side
effect of the conventions used in the current set of representations.  Most of
the important conventions concerning the representation of a value are
common to all values and hence are expressed in the network at the level
of the %AVALUE-SCHEMA%*.  There is thus relatively little more that the
schemata at lower levels have to add.
	This transparency also results from the assumption that the expert will not be
expected to make changes in the basic formalism of the performance program.  In
line with this assumption, when dealing with the expert, the schema interpreter does not
request two types of information that would normally be part of the description
of a new data type:# substructure and interrelationships.  Wherever a new schema
is added to the network, there is the possibility that the data structure it
describes has additional substructure and interrelationships beyond those
described by its ancestors in the network.  To be complete, the system should
naturally ask about them.  But notice that the answer to either the
question of substructure or interrelationship requires a knowledge of, and
implies potential alterations to, the underlying formalism of the
performance program.  Any substructure in a new data type would have to be
referenced somewhere in the performance program if it is to be of use,
implying that the performance program code would have to be changed.  The
ability to specify new interrelationships between data types implies an
understanding of the data types that already exist.  Since both of these
clearly require an understanding of elements of the system that would be
alien to the expert, they are omitted.  (We will see later that they are
asked under other circumstances.)
	This approach allows the expert to teach the system about new attributes
and values without getting involved in programming details.  The
price is a small possibility that he may compromise the integrity of the data
base, if the new data type in fact should be related  to some existing structure.
	There is really a more fundamental problem here:
The current design of {SYSTM MYCIN}'s representations makes each kind of
%AVALUE%* its own data type.  This is what makes it necessary to acquire a new
schema and pushes the task into the realm of changes to the performance
program.  With some redesign of the data structures involved, it would be
possible to have just a single kind of %AVALUE%* data type, and avoid all
this.  But as indicated, it was necessary to work within the existing
representations in {SYSTM MYCIN} and still make it possible for the expert to
educate the system.

.SS(Knowledge about knowledge about representations,KAKAR:);
.ind knowledge about representations
.ind interactive transfer of expertise
	&& was designed to make possible interactive transfer of expertise. As
we have seen, one kind of expertise it can transfer is domain-specific
information, the kind supplied by an expert to improve the operation of a
performance program.  But recall that high performance on the transfer of
expertise task required a store of knowledge about representations. If && is
designed to make possible interactive transfer of expertise independent of
domain, why not apply it to the task of acquiring and maintaining the requisite
base of knowledge about representations?  That is, why not push this back a
level and consider the knowledge about representations as a candidate for
interactive transfer of expertise?#
This has been done and involves using && in two phases ({YONFIG TAR}).
	As we have seen, the domain expert uses && to teach the performance program
about the domain of application.  High performance on this task is made possible
by the base of knowledge about representations provided by the schemata.  But
the system architect can also use && to teach about a particular set of
representations.  High performance on this task is made possible by the
%2schema-schema%*, a base of "knowledge about knowledge about representations,"
which is used to guide the process of describing a new
representation.
It is, in effect, a set of instructions describing how to specify a
representation.  Since the instructions are in the same format as those
in an ordinary schema, the process of following them is identical.  As a result,
we need only a single "schema interpretation" process.  Teaching about a
representation (acquiring a new schema) is thus computationally identical to
teaching about the domain (acquiring a new instance of a schema); indeed, both
teaching tasks shown in {YONFIG TAR} are done with a single body of code.
.STARTFIG;BOXFIG; TURN OFF "%";

            teaching about        teaching about the
           a representation      domain of application
⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃      ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃       ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
}knowledge of   }      }knowledge of   }       }              }
}representation-}      }primitives for }       } object-level }
}independent    }= = @ }a specific     } = = @ }knowledge base}
}primitives     }      }representation }       }              }
%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$      %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$       %∂∂∂∂∂∂∂∂∂∂∂∂∂∂$

   SYSTEM 2                SYSTEM 1                SYSTEM 0

.FIG(The two applications of schema instantiation,TAR:);
.ENDFIG;
	Earlier sections of this chapter displayed three examples from the
process of teaching about the domain and demonstrated how each could be
understood in terms of filling out one or more schemata.  This section explores
an example of the first process--teaching about a representation--and views it
in terms of %2augmenting the schema network%*.
.ind inheritance of properties
	This view is useful both computationally and conceptually.
The computational task is simplified because much is accomplished by adding a
single branch to the schema network, which is an information-rich structure. The
new schema will inherit all of the information represented in its ancestors in
the network and hence need not replicate it.  The task becomes easier
conceptually since the network offers a useful framework for organizing and
understanding all the different representations in a program.
	This approach assumes, of course, that the different data types in a
program can, in fact, be organized into a generalization hierarchy.  If this is
true, the hierarchy can be used to provide another tool for dealing with
complexity, providing a useful
organizational overview when there are a number of related data types.
	Building the schema network also provides one useful test of the
generality of this part of the system.  If the techniques used are sufficiently
general, it should be possible to grow the entire network from a foundation
that is not specific to any particular representation.  This was made one of
the design criteria for the system, and has provided useful guidance.
	We will see that the schemata
have been applied to a variety of data structures. This will help make plausible
the claim that they form a useful tool for attacking one central problem faced
in building knowledge bases: the construction and maintenance of large
collections of varied data structures.
	Since this  stage of knowledge
base construction deals with the process of describing representations, we do
not expect that it would be accomplished by the expert from the application
domain.  The task requires a knowledge of programming and may require changes
to the basic formalism of the performance program.  In this use, then, the
schemata are more properly viewed as a "programmer's assistant" tool, to be used
by someone with the appropriate background.  The language of the next dialog will
reflect this new orientation, since it assumes a familiarity with both general
programming issues and the language of the schemata.

.SSS(The SCHEMA-SCHEMA,SCHSCH:)
	As noted earlier, in our framework the process of describing a new
representation can be made computationally identical to that of describing
new instances of a representation. This uniformity is made possible by the
schema-schema, shown below.
	The schema-schema, along with some associated structures, provides a
foundation of representation-independent knowledge that can be used for
constructing an entire knowledge base.  The nature and extent of this knowledge
is outlined below, to characterize the assumptions behind the use of the
schema-schema and, hence, the range of representations for which it is
applicable.
	Knowledge embedded in the schema-schema assumes that:
.BEGINLIST;
	(1)\Data structures have a well-specified syntax.  That is, they have a certain
static quality and maintain the same structure and organization over a lifetime
that includes a number of access, storage, and creation operations.  One
obvious set of candidates are those structures that do not change while the
program is executing.  Conversely, applying this to temporary structures which
are quickly modified would be less successful.
.SKIP 1;
(2)\Data structures can be specified in terms of distinct sub-units, each of
which has a straightforward syntax and is for the most part independent of
the others.
.SKIP 1;
(3)\Data structures may be interdependent.  Thus, part of the task of specifying
a new representation is to describe any interrelationships it may have with
other structures.
.SKIP 1;
(4)\There is more than one instance of each data type.  The utility of the
schema as a tool for dealing with program complexity is dependent on a
useful instance-to-schema ratio.  If every data structure in a program were
distinct (a 1:1 ratio), the schemata would offer little advantage in
knowledge base maintenance.
.ENDLIST;
Since the schemata were devised as an extension to the notion of a
record structure, it is not surprising to find that several of these assumptions
are common to the use of record structures as well.
.SKIP TO LINE 1; STARTFIG;turn on "↓_"; << the schema schema >>
.group skip 7;
.SELECT 6;
↓_SCHEMA-SCHEMA_↓

PNTNAME      (ATOM      CREATEIT) 
STRUCT       (PNTNAME   INSLOT)
PLIST 	  
       [ (PNTNAME   ((BLANK-INST ADVICE-INST) ASKIT)
          STRUCT    ((PNTNAME INSLOT)         GIVENIT)
          PLIST    
                 [ (INSTOF       (( (PNTNAME INSLOT) GIVENIT )            CREATEIT)
                    DESCR        ((STRING  ASKIT)                         GIVENIT)
                    AUTHOR       ((ATOM    ASKIT)                         GIVENIT)
                    DATE         ((INTEGER CREATEIT)                      GIVENIT)
                    KLEENE       ((SLOTNAME-INST (BLANK-INST ADVICE-INST)) ASKIT))
		  CREATEIT]
          FATHER    ( SCHEMA-INST             FINDIT)
          INSTANCES ( LIST                    ASKIT)
          STRAN     ( STRING                  FINDIT)
          INSTOF    ( SCHEMA-SCHEMA           GIVENIT)
          DESCR     ( STRING                  CREATEIT)
          AUTHOR    ( ATOM                    ASKIT)
          DATE      ( INTEGER                 CREATEIT)
          OFFSPRING ( (KLEENE (0) < SCHEMA-INST >)  ASKIT)
          RELATIONS ((KLEENE (0) <(UPDATECOM-INST KLEENE (1)
			          <(SWITCHCOM-INST KLEENE (1) <KSTRUCT-INST>)>)>)
			     ASKIT))
	  CREATEIT]
  FATHER     (SCHEMA-SCHEMA)
  INSTANCES  ((ALLSCHEMA))
  STRAN      "knowledge structure"
  INSTOF     (SCHEMA-SCHEMA)
  DESCR      "the schema-schema describes the format for all other schemata"
  AUTHOR     DAVIS
  DATE       876
  OFFSPRING  NIL
  RELATIONS  ((ADDTO (AND* ALLSCHEMA)))
.SELECT 5;
.SKIP 5;
.FIG(The schema-schema);
.ENDFIG;
. xgenlines← xgenlines+1;  COMMENT compensate for font 6;
.SKIP TO LINE 1;  TRACESEC( Building the schema network,BSN:);
	This example demonstrates the process of adding a new schema to the
network in the very early stages of knowledge base construction.  At this point
the system does not yet know anything about the concept of organism identity.
.BEGIN "TRACE" STARTRACE;

Schema network editor
Command?
++** %3DOWN%*
    Level 1:
	1) attribute
	2) value of an attribute
.STARTCOM;
{There is a simple editor that allows the user to examine the network and
perform various operations on it.  Here he descends from the root...}
.ENDCOM;

++** %3SHOW 2%*

VALUE-SCHEMA
------------
  PNTNAME       (ATOM      ASKIT)
  VAL           (PNTNAME   INSLOT)
  PLIST         [(INSTOF  (VALUE-SCHEMA     GIVENIT)
	          DESCR   (STRING           ASKIT)
                  AUTHOR  (ATOM             FINDIT)
                  DATE    (INTEGER	    CREATEIT))
		 CREATEIT]
  STRAN         the value of a attribute
  FATHER        (KSTRUCT-SCHEMA)
  OFFSPRING     (SITE-SCHEMA)
  DESCR         the VALUE-SCHEMA describes the format for a
		value of any attribute
  AUTHOR        DAVIS
  DATE          1115
  INSTOF        (SCHEMA-SCHEMA)
.STARTCOM;
{examining schemata along the way...}
.ENDCOM;

++** %3DOWN 2%*
    Level 2:
	    2.1) site of a culture

++** %3ADD 2.2%*
.STARTCOM;
{until he has found the right spot and then indicates that he wants to add a new
schema.  The system retrieves the %ASCHEMA-SCHEMA%* and uses it to guide the
process of describing the new data type.}
.ENDCOM;

Please give a phrase to describe this new type of value of an 
attribute.
++** %3IDENTITY OF AN ORGANISM%*

    If there is one word which neatly sums up "identity of an 
    organism", please give it (else press RETURN)
    ++** %3IDENT%*
.STARTCOM;
{The system needs a name for the new schema and asks the user first.  If he
declines to supply one, the system will manufacture one out of the phrase
"identity of an organism."}
.ENDCOM;

    Please describe any additional properties of a identity of
    an organism. The answer should be in the form of one or
    more of the following-
        <slot name> <blank specifier> <slot advice>
    [Type 1 set to a line, then an empty line when done.]
    ++** %3AIR     (KLEENE (1 1 2) <(AIR-INST CF-INST)>)   ASKIT%*
    ++** %3GRAM       GRAM-INST     ASKIT%*
    ++** %3MORPH     MORPH-INST     ASKIT%*
    ++** %3SYNONYM     (KLEENE (1 0) < ATOM >)    ASKIT%*
    ++** 
.STARTCOM;
{The user indicates several structural components that are part of the new data
type, describing them in the standard slotname-blank-advice format.  These are
in addition to the structural conventions it inherits by virtue of being a type
of %AVALUE%*.}
.ENDCOM;

    Sorry, but the following are hnvalid -
        S→NONYM is not a known <slot name>
    Please answer again [use the same answer if you really
    meant it.]

    ++** %3SYNONYM (KLEENE (1 0) < ATOM >) ASKIT%*
.STARTCOM;
{The concept of a synonym is as yet unknown to the system; so it too has to be
described.  This is set up as a subproblem, and the %ASLOTNAME-SCHEMA%* is
used to guide the description.}
.ENDCOM;

      Please tell me a few things about the concept of SYNONYM
      as a <slot name>.

        Please give a short phrase which can be used to ask for
        the contents of this slot.
        [type an empty line when done]
        ++** %3PLEASE GIVE ALL SYNONYMS OR ABBREVIATIONS%*
	++** %3FOR * WHICH YOU WOULD LIKE THE SYSTEM TO%*
	++** %3ACCEPT.%*
	++**
.STARTCOM;
{Recall that the asterisk is used to indicate a gap to be filled in the
template.  In this case it will be filled in with the name of the new identity
being acquired.}
.ENDCOM;

        Please give a short phrase which can be used to display
        the contents of this slot.
        [type an empty line when done]
        ++** %3THE SYNONYMS OF * ARE%*
	++**

        Please give a description of SYNONYM.
        [type an empty line when done]
        ++** %3SINCE MANY ORGANISM NAMES ARE LONG AND%*
        ++** %3UNWIELDY, SHORTER SYNONYMS ARE OFTEN%*
	++** %3USED. THOSE SYNONYMS ARE PART OF THE%*
        ++** %3DATA STRUCTURE WHICH REPRESENTS AN%*
        ++** %3ORGANISM IDENTITY.%*
        ++**

        Please edit and complete this skeleton function
        definition for the SYNONYM-EXPERT:

[NLAMBDA (BLANKS ADVICE)
  (SELECTQ ADVICE
	   (ASKIT (ASK-XPERT BLANKS (QUOTE SYNONYM)))
	   (GIVENIT BLANKS)
	   (FINDIT  )
	   (CREATEIT )
	   (INSLOT (APPLY* (GETEXPERT BLANKS) SCHEMA
					      (QUOTE GETALL)))
	   [GETONE (CAR (GETP KSNAME (QUOTE SYNONYM]
	   (GETALL (GETP KSNAME (QUOTE SYNONYM)))
	   (GETNEXT (NEXTONE SCHEMA (QUOTE SYNONYM))
	   (NOADVICE BLANKS (QUOTE SYNONYM-EXPERT]
        tty:
	*
.STARTCOM;
{Since there is a slot-expert associated with every slotname, the
acquisition of the slot-expert becomes a new sub-task.  Recall that even though
it is a function, the slot-expert is viewed for the moment as simply another data type, one
of whose components is a function definition.  All of the other components are
sufficiently stylized that they can be manufactured by the system itself, and
this occurs without aid from the user.
	The function definition is complex enough to be an exception
to this, but even here there is enough stylization that the system can prepare a
useful skeleton to be completed by the user.  The standard
%6INTERLISP%2 editor is invoked (announcing itself with the
"%Atty:%*" prompt), to allow the user to make any necessary changes.  Since not
every piece of advice makes sense for every slot-expert, the user may delete
some of the entries.  Other entries may be expanded to account for additional
representation conventions, or edited because the original skeleton is at best a
rough guess.  The point in having the system produce the skeleton is not to
automate the creation of code, but rather to make it as easy as possible for
the user to supply all the information that the system will eventually need.}
.ENDCOM;

	.
	.
	.

	* %3OK%*
.STARTCOM;
{The user finishes the editing (which has been omitted here).}
.ENDCOM;

        Done with the concept of SYNONYM as a <slot name> now.
.STARTCOM;
{Having finally finished with the new slotname, the dialog returns to the
last item needed for the new schema ...}
.ENDCOM;

    Please specify all updating to other data structures which
    will be necessary when a new instance of a identity of an
    organism is acquired.  The answer should be in the form of
    one or more of the following-
    <update command> [1 or more: <selection command> 
			       [1 or more: <data structure>]]
    [Type 1 set to a line, then an empty line when done.]
    ++** %3ADDTO (AND* ORGANISMS)%*
    ++**

    Ok, finished defining IDENT-SCHEMA.

  Level 2:
	    2.1) site of a culture
	    2.2) identity of an organism
Command?
++**
.STARTCOM;
{... and then is done.}
.ENDCOM;
.END "TRACE"
.SSS(Comments on the trace,COTT:);
	The system requires only a very small core of knowledge as the basis
for the schema network construction shown in this trace.  In addition to the
network editor and the schema interpreter, it requires only five schemata and a
small number of instances.∪∪Since the schema-schema needs to refer to the concepts of
slotnames, slot-experts, advice, and blanks, the schemata for these must be
supplied and cannot be bootstrapped.  The instantiations required are the
slotname and slot-experts for each of the slotnames found in a schema, and
instantiations of the advice schema for the nine pieces of advice.∪
From this core of knowledge everything else can
be built.  As a demonstration, the network shown above in {YONFIG SHIER} was
constructed in this fashion.  The single process of schema interpretation was
used to guide the construction of the base of representation-specific knowledge
and then used to instantiate it in order to build a small object-level knowledge base.  The
system was thus bootstrapped from the schema-schema and a few associated
.ind bootstrapping the knowledge base
structures.
	In practice, the content of the knowledge base would not already be
determined, so the process would proceed slightly differently.  A basic
skeleton of the schema network should be constructed first, using the network
editor.  After a few major branches have been supplied, it is then
convenient to go back to typing in new rules and to allow the system to guide
the necessary network growth.  As in the example of acquiring a new attribute,
this means that a new rule might trigger the addition of a new branch to the 
network (the new data type) and then trigger several instantiations of it (the nutrients).
	While rules can be entered from the
very beginning of knowledge base construction,
this tends initially to produce dialogs that are difficult for the user to follow.
With an empty schema network, the first line of the first rule will trigger a
long and deeply recursive dialog.  In general, early in knowledge base
construction, the smallest addition tends to trigger many other additions.  It
is easiest to start by building a basic network a piece at a time with the
editor.
	Since the network can conceivably grow quite complex, some
simple heuristics have been embedded in the editor.  To help deal with the problem
of potential interconnections of a new schema, the editor can propose candidates.
If a new schema were added in the third level of the network of {YONFIG SHIER},
the editor would suggest:
.STARTFIG;
Listed below are 1 or more possible sub-classifications of
this new concept. Please indicate [Y or N] each one that 
applies.
  1 - an attribute whose value is "true" or "false"
  ++** %3YES%A
  2 - a multi-valued attribute
  ++** %3YES%A
  3 - a single-valued attribute
  ++** %3YES%A
.FIG(Proposing potential connections);
.ENDFIG;
	The editor examines the siblings of the new schema and notes
to which other schemata they are connected.  If a sufficient percentage of
the siblings share a common offspring, that offspring becomes a potential
connection in the schema network.  In this case the editor (correctly) proposes the
three schemata to which all the other five siblings are attached.
	An analogous sort of aid is available when specifying the structure
of a new data type.  The network editor examines the three structure-defining
slots (the print name, value, and property list) of the new schema's siblings
and detects regularities in a manner similar to the way rule models are created.
These are then displayed to the user and, like the rule models, can be useful
reminders of  overlooked details.

.SS(Levels of knowledge,LOK:);
.beginind knowledge about representations
	The mechanisms reviewed above provide an extensive amount of
machinery for encoding knowledge about representations.  But it is not enough
simply to provide the machinery--if the result is to be something more than
"yet another knowledge representation formalism," there must be some sense of
organization and methodology that suggests how all this ought to be used.
.ind levels of knowledge
	Organization is provided by a common theme that serves to
unify all of the proposed mechanisms: the notion of %2levels of
knowledge%*.  There are several different (and independent) stratifications
of knowledge implicit in the formalism developed above.  Two of the most
important involve:
.BEGINlist; 
	(a)\describing knowledge in the system at different levels of detail, and 
	(b)\classifying it according to its level of generality.  
.ENDlist;
In both cases, the important contribution is a framework for organizing the
relevant knowledge about representations.  The idea of different levels of
detail indicates that representations (e.g., %2value%* or %2attribute%*)
can be described at the level of global:
.beginlist;
\organization (as in the schema hierarchy),
\of logical structure (as in the schemata), and
\of implementation (as in information associated with the slotnames).
.endlist;
These levels provide an organizational scheme that makes it
easier to specify and to keep track of the large store of information about
representations required by the acquisition task.  The different levels of
generality for classifying knowledge include:
.beginlist
\domain specific,
\representation specific, and
\representation independent.
.endlist
As explained below, the idea of maintaining clear distinctions between
these different kinds of knowledge is an important contributor to much of
&&'s current range of application.

.SSS(Level of detail);
	As noted in {YON1 KARO}, the schema hierarchy, individual schemata,
and slotnames each encode their own form of knowledge about representations.
The hierarchy indicates the global organization of representations in the
system and provides a foundation for both the acquisition of new instances of
existing primitives (a process of descent through the hierarchy and
instantiation of the schemata encountered) and the acquisition of new kinds of
primitives (a process of adding new branches to the hierarchy).∪∪Note that
the entire schema hierarchy is viewed here as dealing with information at a
single level of detail (viz., global organization of representations).
Viewed by itself, it is of course yet another (independent) structuring of
knowledge in the system into various levels.∪ The schemata describe the
logical structure and logical interrelationships of individual
representations and, as prototypes, provide a focus for the organization of
knowledge about a representation.  The slotnames have associated with them
information concerning the implementation of a specific representation,
information at the level of programming-language constructs and conventions
(e.g., variable name uniqueness, etc.).

.SSS(Level of generality);
	Much of the range of applicability of && results from
the isolation and stratification of the three kinds of knowledge shown below.
The base of %2domain-specific%* knowledge at level 0 consists of the collection of all
instances of each representation.
	The base of %2representation-specific%* knowledge at level 1 consists of
the schemata, which are, in effect, the declarations of the extended data types.
These have a degree of domain independence since they describe what an
attribute is, what a value is, etc., without requiring %2a priori%* knowledge
of the domain in which those descriptions will be instantiated.
	The base of %2representation-independent%* knowledge at level 2--the
schema-schema--describes what a declaration looks like.  At this level
resides knowledge about representations in general and about the process of
specifying them via declarations.
.BEGINLIST;INDENT 3,12,8; TABS 9;
	(0)\The knowledge base of the performance program contains:
	\%2object-level%* knowledge that is
	\%2domain specific%* and is formed by
	\instantiating the appropriate schema to form a %2new instance of an existing conceptual 
primitive%*.
.SKIP 1;
	(1)\The knowledge about representations (the schemata) contains:
	\%2meta-level%* knowledge that is
	\%2representation specific%* and is formed by
	\instantiating the schema-schema to form a %2new type of conceptual primitive%*.
.SKIP 1;
	(2)\The schema-schema contains:
	\%2second order meta-level%* knowledge that is
	\%2representation independent%* and is formed by 
	\hand.

.FIG(Levels of generality of knowledge about representations,LOGFIG:);
.ENDLIST;
	While level 2 is formed by hand, it is the only body of knowledge
in the system for which this is true, and it forms a small core of
knowledge from which everything else can be built.  For example, the schema
hierarchy shown in {YONFIG SHIER} (and all associated structures) was
constructed by bootstrapping from the schema-schema and the core of
structures noted in {YON2 COTT}.  The single process of schema
interpretation was used to guide the construction of the base of
representation-specific knowledge (the hierarchy and schemata) and then
used to instantiate it to build a small object-level knowledge base.
	One reason that this is a practical approach is the great leverage in the
notion of a schema as a prototype.  The current performance program, for
instance, contains knowledge about some 125 organisms, but a single schema
serves to characterize every one of them.  There are some 25 different
representations in the program, requiring 25 schemata; yet a single
schema-schema serves to characterize all of them.
	It was, in fact, precisely such utilitarian considerations that motivated
the initial creation of the schema-schema.  Recall that the schemata were
developed because there were many details involved in creating a new object and
adding it to the system.  But there turned out to be a large number of details
involved in creating all the necessary schemata, too.  The schema-schema was
thus the result of the straightforward recursive application of the basic idea, 
for precisely the same reason.

.SSS(Impact);
	The direct advantages of this  stratification arise from the
capabilities it   supports.  The compartmentalization of knowledge suggested
by the levels of generality, for instance, provides an increased range of
applicability of the system.  The single schema-instantiation process can
be used with the core of representation-specific knowledge in a range of
different domains, or it can be used with the representation-independent
knowledge over a range of representations.  Describing representations at
different levels of detail, on the other hand, offers a framework for
organizing and keeping track of the required information.  It also provides
a useful degree of flexibility in the system, because the multiple levels
of description insulate changes at one level from the other levels.  Thus,
in the same way that
modifications to information associated with the slotnames can change
the implementation of a representation without impacting its logical
structure (exactly in the manner of record structures [[Balzer67]), so
changes can be made to logical structure (the schemata) without impacting
the global organization of representations (the hierarchy).∪∪Changes at one
level may quite possibly require additional changes at that same level in
order to maintain consistency of data structure specifications or to
assure the continued operation of the program.  But by organizing the
information in the levels described, the effects of changes will not
propagate to the other levels of description.∪
.endind levels of knowledge
	In a more general sense, both stratifications provide guidance in
using the representational machinery proposed above.  In both cases we have
a set of general guidelines that suggests the appropriate mechanism to use
for each of the forms of knowledge necessary for the acquisition task.
These guidelines ({YONFIG KLEVELS} and {YONFIG LOGFIG}) deal with dimensions of
knowledge organization that are broadly applicable and hence are not limited
to a single domain of application nor to
a single representational formalism.  They thus help to "make sense" of
the representation scheme outlined here.

.SS(Limitations);
	One of the primary shortcomings of the current implementation is
the simplicity of the structure syntax.  While the %2slotname-blank-advice%*
triples can be combined in various ways and the %2blank%* is capable of
describing a range of structures, the result is still somewhat rudimentary.
The schemata need a more powerful language for describing the "shape" of
data structures before they can be widely applicable.
	More fundamental limitations arise out of
the organization of the slotexperts.  They rely, for instance, on the
assumption that knowledge about the representations being described can be
broken down into basically independent chunks indexed by the slotname and
advice.  This requires a certain modularity in the design of a representation
that is not always possible to supply.
	Independence of slotnames implies that a representation
can be decomposed into a collection of independent subparts, and this is not
always true.  While the current implementation is able to deal with a limited
amount of interdependence between slots, more complex interdependencies do not
appear to be accommodated easily.  The current implementation can, for example,
make it possible for one slot to use the contents of another but cannot deal
with the situation in which the contents of one slot restrict the set of
admissible contents of another.
	This inability to deal with more complex interrelationships of
representations is currently the system's primary shortcoming.  Related attempts
to formalize such information have come from many directions (e.g.,
[[Spitzen75], [[Stonebreaker75], [[Suzuki76]) and have encountered similar
difficulties.  Specifying complex integrity constraints is fundamentally a
problem of knowledge representation and confronts many of the same difficult
issues.
	Another limitation arises from the assumption in the design of
the %2advice%* concept that the question of where to get
the information to fill a slot can be broken down into a collection of
cases that are (%2i%*) broadly applicable and (%2ii%*) independent.  The
issue is not so much precisely which cases are chosen, but that some set of
them can be assembled that will provide a "language" for designating the
sources of knowledge used in creating a data structure. The range of
application of the set determines the ease with which the whole approach
can be used. If, for instance, there are important differences in the
implications of %ACREATEIT%* for two different slotexperts, then whoever
constructs a schema has to know this fact.  But then little is gained by the
whole approach.  It becomes far less transparent, and the slotname/advice
indexing scheme becomes an obscure way of invoking particular pieces of
code.  More serious problems would arise if the question of where to get
the information to fill the slots could not even be decomposed into any set
of distinct, independent cases.
	Several of the traces demonstrated that the acquisition of one kind of
data structure can lead to acquisition of another, as in the case of an attribute
leading to its associated values.  This is a useful feature, since it means
that the
system tends to request coherent blocks of information from the expert.  It
depends, however, on explicit structural interconnections between data
types--the attribute leads to acquiring values because those values are part of
its structure.  Had this not been the case, the link would not have been made
and the system would have acquired each new value as it was mentioned.  This
means that the design of the representations used can have an important impact
on the coherence of the acquisition dialogs.
	Requesting the expert's help in descending the schema network assumes
that the display of the alternative paths will be comprehensible to him.  This
presumes a correspondence between the representations in the program and objects
in the application domain.  While it does seem likely that such a correspondence
will exist, its absence would present a significant problem for our system.
	While the schemata make possible a number of useful features, they are not
without associated costs.  Most of this cost tends to be borne by the system
designer, to the benefit of the expert who wants to augment the knowledge base.
This is because the schemata impose a certain discipline on the system designer,
requiring, in particular, that he  view the representations in the system in fairly
general terms and  fit them into the framework provided.  While there are
advantages to doing this, it may not be an easy task.  Especially during the
early stages of system design, when numerous changes are made, the cost may
outweigh its advantages.  In their present state of development, then, the
tools described in this chapter are more appropriate to ongoing knowledge base
maintenance than to the initial phases of knowledge base construction.
	Perhaps the most general limitation of the techniques outlined here
concerns the conceptual level of the system's task.  It is no accident that we
have emphasized at many points the use of a high-level language and the
manipulation of extended data structures that correspond to objects in the
world being modeled.  There clearly are tasks and system designs for which this
correspondence cannot be maintained.  However, to the extent that a system can
be viewed successfully in these high-level terms, the methodology can be very
useful.  In general, the higher the level of the language and programming, the
more applicable the techniques will be.
	Finally, the current system cannot yet acquire new objects 
(i.e., "objects" as in the second part of an attribute-object-value triple)
or new predicate
functions.  Of these, the latter is more difficult, since it is basically a
problem in automatic programming and no attempt has been made to solve it.
Objects present a different challenge, since they are represented by some
highly convoluted data structures (they are designed currently for maximum
efficiency, at the price of comprehensibility).  Schema syntax
will have to be extended before it is capable of describing them, but it is not
clear whether this arises solely from a shortcoming in the expressive power of
the schemata or whether the convoluted design of objects contributes to the
problem as well.  It is not reasonable, of course, to design a language and then
claim that anything it cannot express should not be said.  But every language
carries its own perspective, and the schemata stress the simplicity of design
that arises from decomposability.  (As discussed above, they currently rely
%2too%* heavily on this.)  One of the potential long-term benefits of a
representation language, however, is as a vehicle for developing and formalizing
principles of good design.  Given a language based on such principles, it might
then be said with some justification that what could not easily be stated might
profit from reconsideration.  The schemata are, naturally, only a single step in
this direction and much more work is needed.

.SS(Future work);
.SSS(Minor extensions);
	As is apparent from a number of the traces, the system's "depth-first"
approach to acquisition can be difficult to follow.   Despite the messages
printed by the system and the indication of level given by the
indentation of the dialog, it is not always easy to remember which problem
the system is returning to after it finishes up with a subproblem.  As noted, this
could  be solved by using a modified breadth-first search, in which the
system finished acquiring all the necessary components at its current level
before taking up any subproblems.  In acquiring a new attribute, for
example, this would mean that acquisition of the new attribute would be finished
before starting to deal with its associated values.
	It should also be possible to suspend the
acquisition task temporarily.  The expert can at times find himself involved in a protracted
dialog that is not immediately relevant to the bug he started to correct.  All
the information requested will prove necessary eventually, but it may prove to be
an unreasonable distraction to have to deal with every detail before getting
back to the original problem.
	The system is currently an effective listener for anyone who knows just
what he wants to say, but it is not at all forgiving.  It enforces a one-pass,
"get-it-all-right-the-first-time" approach, and this is clearly an unrealistic view
of knowledge base development.  For example, the schema network may require
reorganization as a result of several causes.  This may become
necessary because of mistakes in describing the schemata originally, because
further development of the performance program dictates redesign of some
representations, or because the addition of a new schema to the network requires
it in order to maintain the proper inheritance of properties.  This makes the
network editor a good candidate for additional work.  As currently implemented,
it does not offer any mechanism for reorganizing existing schemata;  to be a
truly useful maintenance tool, it should be extended to provide a wider range of
such capabilities.  (See [[Sandewall75] for some suggestions on similar data base
reorganization problems.)
	Once it becomes possible to modify existing representations, there is
an auxiliary capability that would prove extremely useful.  After the user has
finished modifying any schema, the editor should be prepared to execute those
same modifications on all current instances of the schema.  That is, the system
should "look over the user's shoulder" and then make the same changes to all
instances of the schema.  Simple deletions or reorganizations could be performed
unaided; where new components had been added, the system would prompt for the
appropriate entry for each instance.  This would allow extensive changes in
representation design with relatively little effort and a reduced probability
of introducing errors.

.SSS(Major extensions);
	Perhaps the most interesting major extension to the system would
be the addition of semantic information to the schemata.  They were designed
originally to convey the syntax of data structures, but as {YON2 WTS}
illustrated, inclusion of semantic information would prove very useful--it
would make the system appear "smarter" by allowing it to take advantage of
context from the debugging dialog to guide its own descent through the schema
network.  Representation of the semantics might be based initially on more
extensive use of patterns like those described above, but more sophisticated
mechanisms should eventually be devised.

.SS(Summary,SUM6:);
.SSS(Review of major concepts,REVIEW6:);
	At the beginning of this chapter we suggested that it would be
instructive to consider the terms %2knowledge representation, extended data
type, %*and%2 data structure%* as equivalent, to see what might be learned
by viewing each of them in the perspective normally reserved for one of the
others.  A number of the key ideas involved in the design and use of the
schemata were inspired by this mixing of perspectives.
	The fundamental idea of %2a base of knowledge about
representations%*, for instance, was suggested by the view of
representations as extended data types and motivated by the desire to
organize and represent knowledge about those data types.  This led  to the
idea of the %2schemata as a language and mechanism for describing
representations%*, and it strongly influenced schema design by indicating
what sort of information they ought to contain (e.g., structure and
interrelationships).  This view also suggested the organization of that
information and led to %2organizing it around representational
primitives%* (e.g., attribute, object, value, etc.), which were, in turn,
%2represented as prototypes%* (the schemata) and %2instantiated to drive
the interactive transfer of expertise process%*.
	Viewing extended data types from the perspective of knowledge
representations led to incorporating  the %2advice%*
mechanism in those data types.  This provided an additional source of knowledge about those
structures and allowed a "high-level" dialog that was coherent to the
domain expert.
	Blurring the distinction between data type and knowledge
representation offered an interesting consideration for knowledge base
design.  To see how this consideration arose, note that a subtle factor
that added to the coherence of the acquisition dialogs earlier was the
somewhat fortuitous correspondence between data structures and
domain-specific objects (e.g., organisms).  This meant that the acquisition
dialog appeared to the expert to be phrased in terms of objects in the
domain, while to the system it was a straightforward manipulation of data
structures.  Such a correspondence helps to bridge the gap in perspectives,
and the purposeful attempt to insure its presence in a system can be a very
simple, but useful consideration in the initial design of a knowledge base.
	A second set of major ideas involved in the schemata arose from the
notion of %2levels of knowledge%* described above.  As noted, this stratification
of knowledge provided an %2increased range of applicability%* for the
techniques and offered a %2set of guidelines for organizing the body of
knowledge about representations%*.  It also suggested that  the
acquisition of
new instances be viewed as a process of %2descent through the schema hierarchy%*,
and that the acquisition of new kinds
of knowledge representations be viewed as a
process of %2adding new branches to the hierarchy%*.

.SSS(Current capabilities,CURCAP:);
	The schemata and associated structures offer a language and framework in which
representations can be described.  This language strongly emphasizes making
explicit the many different kinds of knowledge about representations and offers a
framework for organizing that information.  The schema hierarchy, individual schemata, and
slotnames each support their own variety of that knowledge.  The result can be a
useful global overview of the organization and design of all the representations
in the system.
	For both the system engineer and the applications domain expert, the
knowledge acquisition capabilities of the schemata offer a very organized and
thorough assistant that can:
.BEGINLIST;
	(a)\%2attend to many routine details%*,
.CONTINUE;
Some of these are details of data
structure management, and having the system attend to them means the expert need
know nothing about programming. Others are details of organization and format,
and with these out of the way, the task of specifying large amounts of knowledge
becomes a good deal easier.  The emphasis can then be placed on specifying its
content rather than attending to details of format.
.SKIP 1;
	(b)\%2show how knowledge should be specified, and%*
.CONTINUE;
In terms of the three systems
pictured earlier, the assistant's intelligence always lies at the level above
that of the knowledge being specified.  While it cannot choose a
representation for an organism, it can indicate how the representation should be
specified.  Similarly it cannot suggest what the gramstain of a new organism
might be, but it can indicate that every organism must have one, and can
describe exactly how it should be specified.  It is this ability to structure
the task and lead the user through it that is most useful.
.SKIP 1;
	(c)\%2make sure that the user is reminded of all the items he has to supply%*.
.CONTINUE;
Since knowledge base construction is viewed as a process of knowledge transfer,
the assistant's thoroughness offers some assurance that the transfer operations
will not inadvertently be left incomplete.
.ENDLIST;
	In summary, the assistant cannot supply answers, but it does know what
all the proper questions are and what constitutes a syntactically valid answer for each.  The
application domain expert will rely on the assistant to show him how to transfer
his knowledge to the program, while the system designer can use the
assistant as an aid in knowledge base management, using it to help him keep
track of the large number of representations that may accumulate during the
construction of any sizable program.
	All of this should make plausible the suggestion that the tools discussed
above, when combined with a simple core of representation-independent
information, offer a basis for assembling a sizable collection of knowledge.
They provide, as well, a useful perspective on the organization and representation
of several levels of knowledge, making the transfer process straightforward and
effective.